#+TITLE: Process Meeting Transcript Workflow
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-02-03

* Overview

This workflow defines the process for processing meeting recordings from start to finish: finding recordings, extracting audio, transcribing via AssemblyAI, identifying speakers, correcting errors, and archiving files.

* When to Use This Workflow

Trigger this workflow when:
- Craig says "process the transcript" or "process the recording" or similar
- New recording files (.mkv) appear in ~/sync/recordings/ after meetings
- Craig wants to process meeting recordings into labeled transcripts

* Prerequisites

- Recording file(s) exist in ~/sync/recordings/ (*.mkv)
- Calendar files available at ~/.emacs.d/data/*cal.org for meeting titles
- AssemblyAI transcription script at ~/.emacs.d/scripts/assemblyai-transcribe
- AssemblyAI API key stored in ~/.authinfo.gpg (machine api.assemblyai.com)
- ffmpeg available for audio extraction

* The Workflow

** Step 1: Identify Engagement and Write Session Context

Before starting transcript processing:

1. *Identify which engagement this meeting belongs to:*
   - DeepSat (default for current work)
   - Vineti (historical)
   - Salesforce (historical)
   - If unclear, ask Craig

2. *Set destination paths based on engagement:*
   - Assets: ~{engagement}/assets/~ (e.g., ~deepsat/assets/~)
   - Meetings: ~{engagement}/meetings/~ (e.g., ~deepsat/meetings/~)
   - Knowledge: ~{engagement}/knowledge.org~ for reference

3. Update docs/session-context.org with current status:
   - Note that we're about to process a meeting transcript
   - Get meeting name by checking ~/.emacs.d/data/*cal.org (match date/time to transcript timestamp)
   - If meeting not found in calendar, ask Craig for the meeting title

4. Ask Craig if he wants to compact the conversation context:
   - Transcript processing can use significant context
   - Compacting now preserves the session context file for recovery

** Step 2: Find Recording Files

Find and match recording files with calendar events:

1. **List recordings:** Find all .mkv files in ~/sync/recordings/
   #+begin_src bash
   ls -la ~/sync/recordings/*.mkv
   #+end_src

2. **Extract timestamps:** Parse date/time from each filename (format: YYYY-MM-DD_HH-MM-SS.mkv)

3. **Match with calendar:** Check ~/.emacs.d/data/*cal.org for meetings at those times
   #+begin_src bash
   cat ~/.emacs.d/data/dcal.org | grep -A2 "YYYY-MM-DD"
   #+end_src

4. **Present selection table to Craig:**
   | Filename                    | Meeting / Date-Time            |
   |-----------------------------+--------------------------------|
   | 2026-02-03_10-00-00.mkv     | DeepSat Standup (from calendar)|
   | 2026-02-03_14-30-00.mkv     | 2026-02-03 14:30 (no match)    |

5. **Craig selects files:** One, several, or all files to process

6. **Queue for processing:** Selected files ordered oldest → newest for serial processing

** Step 3: Extract Audio

For each selected recording file, extract audio for transcription:

#+begin_src bash
ffmpeg -i ~/sync/recordings/FILENAME.mkv -vn -ac 1 -c:a aac -b:a 96k /tmp/FILENAME.m4a
#+end_src

Settings:
- =-vn= : no video (audio only)
- =-ac 1= : mono channel (sufficient for speech, smaller file)
- =-c:a aac= : AAC codec
- =-b:a 96k= : 96kbps bitrate (sufficient for speech transcription)

Output: /tmp/FILENAME.m4a (temporary, deleted after transcription)

** Step 4: Transcribe with AssemblyAI

1. **Run transcription:**
   #+begin_src bash
   ~/.emacs.d/scripts/assemblyai-transcribe /tmp/FILENAME.m4a > ~/sync/recordings/FILENAME.txt
   #+end_src

2. **Clean up:** Delete intermediate .m4a file after successful transcription
   #+begin_src bash
   rm /tmp/FILENAME.m4a
   #+end_src

3. **Output format:** The script produces speaker-diarized output:
   #+begin_example
   Speaker A: First speaker's text here.
   Speaker B: Second speaker's response.
   Speaker A: First speaker continues.
   #+end_example

4. Continue to speaker identification workflow below.

** Step 5: Locate Files

Confirm the transcript and recording files are ready:

1. **Verify transcript exists:**
   #+begin_src bash
   ls -la ~/sync/recordings/FILENAME.txt
   #+end_src

2. **Verify recording exists:**
   #+begin_src bash
   ls -la ~/sync/recordings/FILENAME.mkv
   #+end_src

3. **Get meeting title:** If not already known from Step 2, check calendar
   - Calendar location: ~/.emacs.d/data/*cal.org
   - Match the meeting time to the transcript timestamp

** Step 6: Read and Analyze Transcript

1. Read the full transcript file

2. Identify speakers by analyzing context clues:
   - Names mentioned in conversation ("Thanks, Ryan")
   - Role references ("as the developer", "on the IT side")
   - Project-specific knowledge (who works on what)
   - Previous meeting context (known attendees)
   - Speaking order patterns

3. Build a speaker identification table:
   | Speaker | Person | Evidence |
   |---------|--------|----------|
   | A       | Name   | Clues... |

** Step 7: Confirm Speaker Identifications

Present the speaker identification table to Craig for confirmation:
- List each speaker label and proposed name
- Include the evidence/reasoning
- Ask about any uncertain identifications
- Note any new people to add to notes.org contacts

** Step 8: Create Labeled Transcript

1. Replace all speaker labels with actual names

2. Correct transcription errors:
   - Common mishearings (names, technical terms, company names)
   - Known substitutions from this project:
     - "Vanetti" → "Vineti"
     - "Fresh" → "Vrezh"
     - "Clean4" / "clone" → "CLIN 4"
     - "Vascan" → "Vazgan"
     - "Hike" / "Ike" → "Hayk"
     - "High Tech" → "HyeTech"
     - "Java software" → "JAMA software"
     - "JSON" (person) → "Jason"
     - "their S" / "ress" → "Nerses"
   - Technical terms specific to DeepSat (GovCloud, AFRL, SOUTHCOM, etc.)

3. Save to engagement assets folder:
   - Location: ~{engagement}/assets/~ (e.g., ~deepsat/assets/~)
   - Filename: YYYY-MM-DD-meeting-name.txt
   - Example: deepsat/assets/2026-02-03-standup-ipm-grooming.txt

** Step 9: Copy Recording to Meetings Folder

1. Ensure engagement meetings folder exists and pattern is in .gitignore (~*/meetings/*.mkv~)

2. Copy the .mkv file with descriptive name:
   #+begin_src bash
   cp ~/sync/recordings/YYYY-MM-DD_HH-MM-SS.mkv {engagement}/meetings/YYYY-MM-DD_HH-MM-meeting-name.mkv
   #+end_src
   Example: ~deepsat/meetings/2026-02-03_11-02-standup-ipm-grooming.mkv~

3. Verify the copy succeeded

** Step 10: Update Session Context with Meeting Summary

Add a meeting summary section to docs/session-context.org including:

1. **Attendees** - List all participants

2. **Key Decisions** - Important choices made

3. **Action Items** - Tasks assigned, especially for Craig

4. **New Information** - Things learned that should be noted

5. **New Contacts** - People to add to notes.org

** Step 11: Write Session Context File

Update docs/session-context.org with:
- Files created this session (transcript, recording)
- Summary of what was processed
- Next steps (file to assets, update notes.org, etc.)

*** Context Management (for multiple files)

When processing multiple recordings in a queue:

1. **After completing each file's workflow**, update docs/session-context.org with:
   - Files processed so far
   - Current position in queue
   - Summary of meeting just processed

2. **Ask Craig if compact is needed** before starting next file:
   - Transcript processing uses significant context
   - Compacting preserves session context for recovery

3. **If autocompact occurs**, reread session-context.org to:
   - Resume at correct position in queue
   - Avoid reprocessing already-completed files

** Step 12: Clean Up Source Files

After successful completion of all previous steps, delete the source files from ~/sync/recordings/:

1. **Delete the original recording:**
   #+begin_src bash
   rm ~/sync/recordings/FILENAME.mkv
   #+end_src

2. **Delete the raw transcript** (if generated):
   #+begin_src bash
   rm ~/sync/recordings/FILENAME.txt
   #+end_src

This step happens last to ensure all files are safely copied/processed before deletion. If anything goes wrong earlier in the workflow, the source files remain intact for retry.

* Output Files

| File               | Location                                              | Purpose                            |
|--------------------+-------------------------------------------------------+------------------------------------|
| Labeled transcript | {engagement}/assets/YYYY-MM-DD-meeting-name.txt       | Corrected transcript for reference |
| Meeting recording  | {engagement}/meetings/YYYY-MM-DD_HH-MM-meeting-name.mkv | Video for review (gitignored)      |
| Session context    | docs/session-context.org                              | Crash recovery, meeting summary    |
| Knowledge base     | {engagement}/knowledge.org                            | Team, infrastructure, corrections  |

* Common Transcription Errors

Keep this list updated as new patterns emerge:

| Heard As      | Correct       | Context                                        |
|---------------+---------------+------------------------------------------------|
| Vanetti       | Vineti        | Company where Craig, Nerses, Eric, Ryan worked |
| Fresh         | Vrezh         | Developer name                                 |
| Clean4, clone | CLIN 4        | Contract milestone                             |
| Vascan        | Vazgan        | MagicalLabs AI team member                     |
| Hike, Ike     | Hayk          | CTO name                                       |
| High Tech     | HyeTech       | Armenian tech community org                    |
| Java software | JAMA software | Requirements traceability tool                 |
| JSON (person) | Jason         | DevSecOps or advisor                           |
| their S, ress | Nerses        | CEO name                                       |
| sir Keith     | Sarkis        | BD/investor relations                          |
| Fastgas       | MagicalLabs   | Armenian AI contractor                         |
| Sitelix       | Cytellix      | CMMC security/compliance partner               |

* Tips

1. **Read the whole transcript first** - Context from later in the meeting often helps identify speakers from earlier

2. **Use the calendar** - Meeting names help set expectations for who attended

3. **Check engagement knowledge.org** - Team roster and transcription corrections specific to this engagement

4. **Ask about unknowns** - If a new person appears, ask Craig for context

5. **Note new learnings** - Update engagement knowledge.org with new contacts, corrections, or context after processing

* Validation Checklist

- [ ] Engagement identified and destination paths set
- [ ] Session context written before starting
- [ ] Recording files listed and matched with calendar
- [ ] Craig selected files to process
- [ ] Audio extracted to .m4a (mono, 96k AAC)
- [ ] AssemblyAI transcription completed
- [ ] Intermediate .m4a file deleted
- [ ] Transcript file verified
- [ ] All speakers identified
- [ ] Speaker identifications confirmed with Craig
- [ ] Transcript corrected and saved to {engagement}/assets/
- [ ] Recording copied to {engagement}/meetings/ with proper name
- [ ] Session context updated with meeting summary
- [ ] New contacts/info flagged for {engagement}/knowledge.org update
- [ ] (If multiple files) Queue position tracked in session context
- [ ] Source files deleted from ~/sync/recordings/