#+TITLE: Task Audit Workflow
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-05-22

* Overview

A *task audit* reconciles each open task's recorded content against the actual state of the world, then fixes what's stale and surfaces what needs the user. It answers "is the information in this task still *true*?" — not "is this task still worth doing?"

That distinction is the whole point, and it separates this workflow from its two siblings:

- =open-tasks.org= — *displays* open tasks (list mode) or picks the next one. Read-only.
- =task-review.org= — *relevance/priority hygiene*: walk the oldest-unreviewed tasks, re-grade / keep / kill / mark DOING, stamp =:LAST_REVIEWED:=. Decides whether a task still belongs and at what priority.
- =task-audit.org= (this one) — *content reconciliation*: take the surviving tasks and check their recorded facts against reality (sessions, email, chat, ticketing, calendar, meeting recordings), update the stale facts, mark statuses that moved, consolidate duplicates, and flag the judgment calls only the user can answer.

They compose: =task-review= keeps the list lean; =task-audit= keeps the survivors honest. Run a task audit when a task's *content* looks stale (a "waiting on X" that already happened, a date that passed, a status that moved), not just when its priority is in question. As its final phase, =task-audit= chains a =task-review= pass (Phase F), so one audit run also refreshes relevance and priority without a second invocation.

* When to Use This Workflow

Trigger phrases:
- "task audit" / "audit my tasks" / "audit the open tasks"
- "are my tasks still accurate" / "reconcile my tasks" / "update the stale tasks"
- "go through the open tasks and update them"

Typical timing:
- After a gap where the world moved but the tasks didn't (travel, an offsite, a conference, a sprint).
- When the user notices a specific task is stale and wants the rest checked too.
- Periodically alongside (but distinct from) the lighter =task-review= habit.

Do *not* use this for picking what to work on (=open-tasks.org=) or for keep/kill/priority grooming (=task-review.org=).

* The Core Split: Autonomous vs. Flag

Every task lands in one of three buckets. The split is the discipline of the workflow.

1. *CURRENT* — the recorded content still matches reality. Leave it (optionally re-stamp =:LAST_REVIEWED:= if it was genuinely re-verified; don't bulk-stamp mechanically).
2. *STALE → update autonomously* — a fact the sources verify has changed: a status that moved, a date that passed, a "waiting on X" that resolved, a dead/renamed link, a duplicate to consolidate, a task the evidence clearly shows is done. Apply the factual update and bump =:LAST_REVIEWED:= on the tasks you edit.
3. *NEEDS-USER → flag, don't guess* — a decision (priority, send/don't-send, keep/kill, push-or-shelve), or a fact that lives only in the user's head (did the meeting happen, what's your read, who referred this). Surface it; never fabricate it.

The line: *factual reconciliation is autonomous; judgment calls and user-only facts get flagged.* Guessing on bucket 3 is the failure mode this workflow exists to prevent.

* The Workflow

** Phase A — Enumerate the open tasks

List every top-level task in the project's open-work section (e.g. =* Work Open Work= in =todo.org=, the headings between it and the next =*= section):

#+begin_src bash
awk 'NR>=<start> && /^\* <next-section>/{exit} NR>=<start> && /^\*\* /{print NR": "$0}' todo.org
#+end_src

Record each task's keyword, priority, tags, and current =:LAST_REVIEWED:=. This is the audit checklist.

** Phase B — Reconcile each task against the sources (read-only)

For each open task, read its body and cross-check its claims against the actual state of the world. The sources (use whichever the project has):

- *Recent session summaries* — =.ai/sessions/*.org= (read the =* Summary= sections; they capture what shipped and what moved).
- *Email* — the project's mail accounts (work + personal). Search by the people/threads the task names.
- *Chat* — Slack/Teams DMs, mentions, and the threads the task references.
- *Ticketing* — Linear/Jira/GitHub. Check the linked ticket's status, comments, and whether a PR merged.
- *Calendar* — did a scheduled event happen; is a SCHEDULED/DEADLINE date now past.
- *Meeting recordings* — if a task hinges on "did this conversation happen / what was said," check the recording queue (e.g. =~/sync/recordings/=) and transcribe via =process-meeting-transcript.org= if the answer lives in an un-transcribed recording. (This is exactly how a "did the interview happen?" task gets resolved instead of guessed.)

Assign each task a bucket (CURRENT / STALE / NEEDS-USER) and, for STALE, the specific factual update.

*Scale tactic.* For a large open-task set, dispatch read-only investigation sub-agents over batches of tasks (parallel-safe per =subagents.md= — independent read-only domains). Each returns a per-task bucket + suggested update. *Never* let sub-agents write to =todo.org= concurrently — apply all edits serially in the main thread (concurrent writes to one file race and lose work).

** Phase C — Apply the factual updates (autonomous)

For every STALE task, edit it in the main thread:

- Add a dated reconciliation note (=*** YYYY-MM-DD Day @ HH:MM:SS -ZZZZ <what changed>=) or update the offending line in place.
- Mark statuses that moved; rewrite "waiting on X" lines whose X resolved.
- Fix dead/renamed =file:= links. When you fix or verify a link that has a matching dead-link entry in the project's lint-followups file, reap that entry in the same edit so the two artifacts don't drift. Scope this strictly to dead-link entries. Do not pull general lint cleanup into the audit, which mixes two concerns and slows it.
- *Consolidate duplicates* — when several tasks track the same thing, fold them into one home and delete the duplicates (per the user's call on which is canonical).
- *Close tasks the evidence shows are done.* When sessions, commits, tests, or tickets clearly show the work landed, close the task per =todo-format.md= (depth-based: top-level → =DONE= + =CLOSED:=; sub-task → dated event-log rewrite). Split the done case:
  - *Done and verified, or needing no human check* — close it directly.
  - *Code-complete but awaiting a human verification* (a visual/aesthetic sign-off, a live-device or live-remote check, an interactive walk) — do not leave it lingering half-open, and do not silently mark the feature verified. Instead: (1) file the pending check as a child under the project's manual-testing parent (e.g. =Manual testing and validation= in =todo.org=), using the structured shape from =verification.md= (descriptive title, "What we're verifying", numbered steps, "Expected") — but search that parent first and skip filing when an equivalent check already exists; then (2) close the original implementation task. The work is done; the human check lives on as a tracked manual-test that promotes to a bug if it fails. This keeps "done but unverified" from piling up as half-open tasks while still honoring "never claim a fix verified before the user confirms."
- *Ensure priority is set per the project scheme.* The top of the project's =todo.org= should carry the priority legend (=[#A]= through =[#D]=). Every task should carry an explicit priority cookie. If a cookie is missing, or no longer matches the reconciled facts, assign the right level per the legend. If the level is unambiguous from the body, do it autonomously; if it's a judgment call (especially the [#A] / [#B] line for important-but-not-urgent work), flag NEEDS-USER. Also enforce the [#A]-discipline rule from the legend — an [#A] task without a =SCHEDULED:= or =DEADLINE:= line is mis-graded and is either down-graded to [#B] (when reconciled facts say "important but not urgent") or surfaced as NEEDS-USER for the user to date.
- *Ensure a type tag is set.* Every task carries one type tag from the project's tag legend (typically =:feature:= / =:chore:= / =:spec:= / =:bug:=). If missing or wrong, assign or correct it from the body when the type is unambiguous. If two tags fit (a refactor that also fixes a bug; a spec that's also a chore), flag NEEDS-USER rather than picking one silently.
- *Enforce the project's declared tag vocabulary.* If the project's tag legend declares an *exhaustive* set of allowed tags, strip from each task any tag outside that set — the heading and parent section already carry topic/scope context, so ad-hoc tags only fragment the vocabulary and defeat tag-based filtering. Normalize near-duplicate spellings to the canonical tag (a plural to its singular, say). Where the legend does not declare the set closed, leave existing tags alone; this step applies only where the allowed set is exhaustive by design.
- *Re-assess the =:quick:= and =:solo:= tags* — reconciliation can change a task's effort or autonomy: a resolved dependency may make a stuck task =:solo:=, a scope cut may make it =:quick:=, and new complexity surfaced by the sources can invalidate either. Add or remove the tags per the definitions in the project's tag legend (and [[file:task-review.org][task-review.org]]) when the reconciled facts make the call clear. When they don't — an effort estimate you can't pin down, a =:solo:= gate you can't confirm — it's a NEEDS-USER flag, not a guess.
- Bump =:LAST_REVIEWED:= on each edited task.

Follow =todo-format.md= for completion mechanics (depth-based DONE vs dated-rewrite) and the working-files / link-hygiene rules when moving artifacts.

** Phase D — Flag the judgment calls (interactive)

Present the NEEDS-USER bucket as a short, scannable list — one line per task, naming the decision or the fact required. Adjudicate with the user one item at a time (inline numbered options per =interaction.md=, no popup). Apply the user's calls as they come (which may itself produce more autonomous updates, or new tasks).

When the trail on a task has gone cold and only the user knows the outcome, *say so and ask* — do not invent a status. ("The screen was set for 5/15; nothing in sessions, email, or chat records what came of it. Did it happen, and what's your read?")

** Phase E — Close out

After Phase D's judgment calls are adjudicated and applied, stamp the audit run by updating =.ai/notes.org='s *Workflow State* section:

#+begin_example
:LAST_AUDIT: YYYY-MM-DD
#+end_example

Use today's =date= output (=date "+%Y-%m-%d"=). Overwrite the previous value — the marker reflects the most recent run, not a history.

If the *Workflow State* section doesn't exist yet, create it (placement: after *Active Reminders*). Other workflows (notably =open-tasks.org= Next Mode's audit-warranted check) read this marker to decide whether to offer running another audit.

** Phase F — Chain a task-review pass

The audit confirmed the surviving tasks are factually honest. The lighter relevance/priority sweep is the natural follow-on, and running it here spares a second invocation. After Phase E's stamp, run [[file:task-review.org][task-review.org]] as the final phase: select the oldest-unreviewed batch, walk it (re-grade / keep / kill / mark DOING), and stamp =:LAST_REVIEWED:= per that workflow.

The split holds. =task-audit= owns factual accuracy and the =:LAST_AUDIT:= marker; =task-review= owns relevance, priority, and the per-task =:LAST_REVIEWED:= marker. Chaining keeps both markers fresh in one pass without merging the two concerns. Under no-approvals mode, the review walk proceeds under the same mode.

Skip the chain only when the user scoped the run to "audit only," or when =task-review= already ran today (its =:LAST_REVIEWED:= stamps are current). Otherwise it runs by default.

* Common Mistakes

1. *Guessing a judgment call instead of flagging it.* If the answer isn't in the sources, it's bucket 3. Ask.
2. *Fabricating an outcome for a cold-trail task.* Surface the gap; don't fill it from imagination.
3. *Bulk re-stamping =:LAST_REVIEWED:= without actually re-verifying.* Stamp only what you checked.
4. *Concurrent sub-agent writes to =todo.org=.* Investigation fans out; edits stay serial in the main thread.
5. *Reconciling against memory instead of the live sources.* Email/chat/ticketing/recordings move independently of the task text — check them.
6. *Doing relevance grooming here.* Keep/kill/priority is =task-review='s job; this workflow is about factual accuracy.
7. *Leaving a code-complete task open because it awaits manual verification.* File the human check under the manual-testing parent (dedup first) and close the implementation task; "done but unverified" should not linger as a half-open task.

* Validation

Created and validated 2026-05-22 against the DeepSat work =todo.org=: reconciled the open-work tasks one at a time, applied factual updates (a contract task's gating, a hiring task whose interview outcome was recovered by transcribing an un-processed meeting recording, a partnership task's stale comms-hold + consolidation of two duplicate trackers, plus several "waiting on X resolved" updates), and surfaced the judgment-call subset (shelve-vs-push, awaiting-team-feedback, who-initiated-this) for the user to adjudicate.

* Revision Notes

** 2026-06-12 — Tag-vocabulary enforcement + code-complete-but-unverified closing

Two Phase C behaviors added, both surfaced by an Emacs-config =todo.org= audit:

- *Tag-vocabulary enforcement.* That project declares a closed tag set (=bug=, =feature=, =refactor=, =test=, =quick=, =solo=); the audit had to strip ~44 ad-hoc tags that had accumulated across the file. The prior workflow only checked that a type tag was *present* — it had no concept of an exhaustive allowed set. The new bullet enforces a declared closed vocabulary and leaves open-vocabulary projects untouched.
- *Code-complete-but-unverified closing.* Many tasks had shipped (tests green, live in the daemon) but stayed open awaiting a manual or visual verification, so they accumulated as half-open. Leaving them open is noise; auto-closing them would violate "never claim a fix verified before the user confirms." The fix routes the pending human check into the project's =Manual testing and validation= parent (dedup-checked) per =verification.md='s manual-verification hand-off, then closes the implementation task. The work is done and the check is tracked; a failed check promotes to a bug.