docs/design/task-review.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349

#+TITLE: Design: Daily Task-Review Habit
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2026-05-16
#+OPTIONS: toc:nil num:nil

* Metadata

- Date: 2026-05-16
- Status: Implemented — revised, see *Revision* below
- Supersedes: =wrap-it-up.org= date-coverage scan (lines 177-213 at design time)

* Revision — 2026-05-20: Shape B (pure Claude workflow), rulesets-owned

The implementation reversed the recommended approach. What actually shipped:

- *Pure Claude workflow, not Emacs elisp.* The "Recommended" approach below (custom =org-agenda= + keystroke minor mode in =task-review.el=) was dropped. The interactive elisp would have to be =require='d into Craig's running Emacs — an =init.el= line in the =archsetup= repo — which couples a rulesets-owned file to another repo's config. Since rulesets owns workflows and Craig already runs a daily Claude session (daily-prep), the friction objection that killed the "/task-review Claude command" alternative no longer holds. The habit is now =claude-templates/.ai/workflows/task-review.org=: Claude surfaces the batch, walks each task, and stamps =:LAST_REVIEWED:= via edits. No elisp, no =task-review.el=, no ERT — *component 3 and the ERT test surface are dropped*. Everything below that describes =task-review.el=, its exports, keymap, =defcustom= vars, and ERT tests is superseded.

- *Owned entirely by rulesets.* Earlier framing put =task-review.el= in =archsetup/dotfiles/=. With no elisp, archsetup is uninvolved. All references below to =archsetup/dotfiles/common/.emacs.d/lisp/= are void.

- *Name collision resolved.* =task-review.org= already existed as a task-listing + next-pick workflow. It was renamed to =open-tasks.org= (keeping its triggers minus "task review"); the new hygiene habit took the =task-review.org= name and the "task review" trigger.

- *Selection.* =task-review-staleness.sh= (component 1) gained a =--list <file> <N>= mode that emits the N oldest-unreviewed tasks for the workflow to walk — one source of truth for both the count (wrap-up/startup) and the batch (the habit).

- *Startup nudge promoted to template-level.* The open question "promote the startup reminder to template-level?" is resolved yes: it lives in template =startup.org= (Phase A count, Phase C surface), not the project-only =startup-extras.org=. Shape B distributes the workflow to every project via sync, so the per-project gating no longer applies.

The Problem statement, the rejected-approaches analysis, the selection algorithm, and the state model (=:LAST_REVIEWED:= property) below all still hold. The architecture and testing sections describe the abandoned elisp design — kept as a record of the decision, not as the plan.

* Problem

The wrap-up's date-coverage scan flags every open =[#A]= / =[#B]=
=todo.org= task that lacks a =DEADLINE:= or =SCHEDULED:= line. The
underlying assumption — that high-priority work needs a date — is
wrong. A task can be important without being time-bound (research,
foundational work, watch-list items). The scan generates per-night
cleanup work that the operator either dismisses repetitively or
papers over with bogus dates, neither of which improves the list.

The actual need is a recurring judgment habit: walk the task list,
verify each entry is still important, well-written, and actionable,
and act on what's drifted. Once the habit is in place, the date-
coverage scan becomes redundant.

Failure modes the habit is designed to catch (in order of severity):

1. /Priority drift/ — =[#A]=s that were urgent six months ago and
   aren't anymore; =[#C]=s that should be =[#A]=s now.
2. /Stalled commitments/ — tasks that keep getting deferred without
   ever being killed or started; they sit as low-grade guilt.
3. /Prose decay/ — entries whose body has lost context or whose next
   action isn't clear anymore.

Success metric: /trust/. When the operator opens =todo.org=, the
priorities and content feel accurate. Quantitative throughput targets
were rejected as distorting the thing they measure.

* Non-Goals

- Replacing =org-agenda= for daily planning. The review is a /list
  hygiene/ habit, not a planning ritual.
- Reviewing every task in the list each session. Coverage is
  rotational, ~12-day cycle.
- Managing children of top-level tasks. Children inherit parent
  priority; only top-level =**= headings are review units.
- Multi-file review. One =todo.org= per project per session.
- Replacing the lint-org pass. That stays as-is; only the date-
  coverage subsection of =wrap-it-up.org= retires.
- Cross-machine state sync beyond what =todo.org= already gets from
  the project's normal git/file-sync path.

* Approaches Considered

Six approaches were generated (three conventional, three tail).

** Recommended: Pure Emacs custom agenda + keystroke macros

A custom =org-agenda= view surfaces the N oldest-unreviewed top-level
=[#A]= / =[#B]= / =[#C]= entries from =todo.org=, sorted by a
=:LAST_REVIEWED:= property ascending (NIL sorts first). A minor
mode binds single keystrokes to actions (re-grade, kill, mark DOING,
keep, skip, edit). Each action stamps =:LAST_REVIEWED:= and advances
point. The review fires daily; cycle length ≈12 days at N=7.

/Why this wins/: trust as the success metric demands the habit
actually happens daily. Friction kills daily habits. Single
keystrokes in an agenda Craig already opens minimize friction. The
operator lives in Emacs already; routing through Claude every
morning fights that. Re-grade and kill (the dominant actions) are
keystroke-trivial. The 5-minute slot constraint rules out Claude
session round-trips.

/Trade-offs/: requires 1-2 evenings of elisp work upfront (vs. a
Claude command that could ship today). Prose touch-ups are rare per
the failure-mode analysis; when they happen, escalate manually via
=e= (edit in source buffer) or invoke Claude separately.

** Rejected: =/task-review= Claude command (daily-prep step)

Conversational walk per task. Pro: immediately buildable, no elisp.
Con: Claude session every morning eats most of the 5-min slot;
5-10 conversational decisions before coffee is heavy friction.

** Rejected: Hybrid Emacs + Claude on demand

Emacs surfaces the batch; Claude handles non-obvious decisions on
request. Pro: deep when needed. Con: two surfaces to maintain;
"when do I escalate" adds cognitive load. May become a v2 path if
the pure Emacs version surfaces a real need for Claude analysis.

** Rejected: No state, random daily sample

Probabilistic coverage via random selection. Pro: zero state. Con:
no guarantee every task gets reviewed in a window; stale tasks can
(probabilistically) hide for months. Harder to claim trust.

** Rejected: Review at write-time

Hook into task creation: prompt to re-grade 1-2 existing tasks
when adding a new one. Pro: no separate habit. Con: doesn't fire
on quiet weeks — exactly when rot accumulates most.

** Rejected: Section-rotating review (one section per day)

Each day review all tasks in one top-level org section. Pro: full
context for one domain. Con: section sizes wildly uneven; doesn't
surface oldest-untouched first; adds pressure to balance section
counts.

* Design

** State Model

A =:LAST_REVIEWED: YYYY-MM-DD= property on each top-level =**=
heading, inside the standard =:PROPERTIES:= drawer. Drawer folds by
default, so visual impact is one collapsed =:PROPERTIES:...:END:=
line per task. NIL (never reviewed) sorts as oldest, so first-pass
coverage is automatic without a manual seeding step.

Mutation surface: one elisp call,
=(org-entry-put (point) "LAST_REVIEWED" (format-time-string "%Y-%m-%d"))=.

** Architecture

Single elisp file: =task-review.el=. Lives at
=archsetup/dotfiles/common/.emacs.d/lisp/task-review.el=. Loaded by
=init.el= via =(require 'task-review)=. One key binding hooks the
entry point.

Exports:

- =task-review-agenda= — interactive entry point. Builds the custom
  agenda buffer.
- =task-review-mode= — minor mode owning the keymap.
- Action functions: one per keystroke, each ending in
  =task-review--stamp-and-refresh=.

Configuration variables (=defcustom=):

- =task-review-batch-size= — default 7. Range 3-10 makes sense.
- =task-review-todo-file= — default lookup:
  =projectile-project-root= + =/todo.org= if present, else fallback.
- =task-review-priorities= — default =("A" "B" "C")=.
- =task-review-confirm-kill= — default =t=. Single-key =y=
  confirmation on the =k= action.

** Selection Algorithm

=org-map-entries= over =task-review-todo-file=, keeping headings
that match every predicate:

- Heading depth = 2 (=**=; top-level under each section).
- Keyword ∈ ={TODO, DOING, VERIFY}=. Excludes DONE and CANCELLED.
- Priority cookie ∈ =task-review-priorities=.

Sort by =LAST_REVIEWED= ascending. NIL sorts first. Ties broken by
file position. Take first =task-review-batch-size= entries.

If zero candidates: display "nothing to review" and return without
opening a buffer. If fewer than batch size: surface what's available
(final batch of a cycle).

Cycle math at N=7 with ~80 open top-level tasks: ~12-day rotation,
3-5 minute daily session.

** Keymap

Bindings in =task-review-mode=:

| Key | Action | Side effects |
|-----+--------+--------------|
| =1= | Set priority to =[#A]= | Stamps, advances |
| =2= | Set priority to =[#B]= | Stamps, advances |
| =3= | Set priority to =[#C]= | Stamps, advances |
| =r= | Keep as-is | Stamps, advances |
| =k= | Kill → =CANCELLED= + =CLOSED:= line | y/n confirmation; no stamp (leaves pool); advances |
| =d= | =TODO= → =DOING= | Stamps; advances; no-op message on =DOING= / =VERIFY= |
| =e= | Open in source buffer for edit | Stamps on return |
| =s= | Skip without stamping | Advances (task stays in pool) |
| =q= | Quit | Saves =todo.org= silently; kills buffer |

=r= is the highest-frequency keystroke by design — most reviewed
tasks are still correct, just need re-stamping.

** Workflow file

Template-level workflow at
=claude-templates/.ai/workflows/task-review.org= (canonical), rsync'd
into every project's =.ai/workflows/task-review.org=. Registered in
=INDEX.org= with three trigger aliases: =task review=, =review tasks=,
=task-review=.

Body shape:

1. Precondition: confirm =todo.org= exists at the project root. If
   absent, surface "this project has no =todo.org=" and exit.
2. Open the review agenda:
   =emacsclient -n --eval "(task-review-agenda)"=.
3. Surface the keymap cheat sheet inline.
4. Wait silently while the operator works in Emacs.
5. Optional close-out: offer help on any task the operator marked
   =e= for prose touch-up.

** Wrap-up integration

The existing date-coverage scan in
=claude-templates/.ai/workflows/wrap-it-up.org= (lines 177-213 at
design time) gets replaced with a /review-habit health check/:

- Run =task-review-staleness.sh todo.org 30= (the extracted bash
  script — see Testing section).
- If the count is non-zero, append one line to =$followups=:
  "/N top-level =[#A]= / =[#B]= / =[#C]= tasks unreviewed for >30
  days. Daily review may have slipped./"
- If zero, silent. No per-task list, no follow-ups dump.

Threshold 30 days = ~2.5 cycle lengths of slack at N=7. One missed
review week is fine; three weeks signals habit problem.

** Startup reminder

Project-specific. =.ai/project-workflows/startup-extras.org= gets a
block calling =task-review-staleness.sh todo.org 7=. If the count
is non-zero, surface one line in Phase C findings: "/N top-level
tasks unreviewed for >7 days — say 'let's do a task review' to run
a cycle/."

Threshold 7 days = ~1 cycle of slack. Softer than the wrap-up alarm.

** Error handling

- =task-review-todo-file= missing → message "no todo.org found";
  return without opening a buffer.
- Malformed =LAST_REVIEWED= value → treated as NIL (sorts oldest).
- Level-1 and level-3+ headings ignored by the selection query.
- Headings without a priority cookie ignored.
- =d= on =DOING= or =VERIFY= → one-line message ("already DOING" /
  "VERIFY tasks transition via answer, not =d="); no state change.
- =k= confirmation aborted → no state change.

** Testing

Three test surfaces.

/1. ERT — =task-review.el= core/

Test file: =archsetup/dotfiles/common/.emacs.d/lisp/tests/test-task-review.el=.
~20 tests across Normal / Boundary / Error per =testing.md=.

Design discipline: pure data layer
(=task-review--candidates=, =task-review--sort-by-reviewed=,
=task-review--stamp=, =task-review--kill-to-cancelled=,
=task-review--transition-doing=, =task-review--regrade=) is
testable directly. UI shell (=task-review-agenda=,
=task-review-mode=, keymap) is composed of the data layer and
verified manually.

Each test uses =with-temp-buffer= + a synthetic =todo.org= fixture
string. No global state.

/2. Bats — =task-review-staleness.sh=/

The staleness math is extracted to
=claude-templates/.ai/scripts/task-review-staleness.sh= so both
wrap-up and startup-extras call the same source. Tests at
=claude-templates/.ai/scripts/tests/task-review-staleness.bats=
(~5 cases): zero candidates, all stale, all fresh, mixed,
threshold-boundary (exactly cutoff date).

/3. Manual smoke checklist/

1. Hit the binding → agenda opens with batch of 7.
2. Press =1= → priority changes to =[#A]=, =LAST_REVIEWED= stamps,
   cursor advances.
3. Press =k= then =y= → task transitions to =CANCELLED=, =CLOSED:=
   line added.
4. Press =q= → buffer closes, =todo.org= saved.
5. Reopen =task-review-agenda= → reviewed tasks no longer appear
   in the top 7.

** Observability

Implicit. =LAST_REVIEWED= timestamps themselves are the log of
when each task was last touched. =git blame= against =todo.org=
gives a deeper trail when needed. No separate logging surface.

* Open Questions

These deferred decisions should resolve during or shortly after
implementation:

- [ ] Exact key binding for =task-review-agenda= entry point.
  Suggested: =C-c r=. Likely candidate for arch-decide if a binding
  collision surfaces.
- [ ] Whether to promote the startup reminder to template-level
  (currently project-specific by Chunk 6 decision). Revisit once a
  second project adopts the elisp.
- [ ] Whether =org-id-get-create= should be added to top-level tasks
  during the first review cycle, hedging in case the state model
  later migrates to a separate state file. Probably premature.
- [ ] Behavior when =todo.org= is edited externally during a review
  session. =task-review-agenda= reads the buffer once; concurrent
  external edits could desync. Likely candidate: refresh on each
  action.

* Next Steps

Implementation status (revised to Shape B — see *Revision*):

- [X] =task-review-staleness.sh= + bats (=--list= and count modes).
- [X] =wrap-it-up.org= health check at threshold 30.
- [X] Existing =task-review.org= renamed to =open-tasks.org= (freed the name).
- [X] New Shape-B =task-review.org= workflow + INDEX entry.
- [X] Startup nudge in template =startup.org= (Phase A count, Phase C surface), threshold 7.
- [-] =task-review.el= + ERT — *dropped* (Shape B is a pure workflow).
- [ ] Smoke test against live =todo.org= — run one real review cycle.

Then:

1. /First cycle/ — run daily for ~2 weeks, gauge whether the habit sticks. Adjust the batch size (default 7) if it feels wrong.
2. /Retire/ — once the habit is established and trusted, the date-coverage scan retirement is complete. The review-habit health check stays as the watchdog.

* References

- Brainstorm session: =.ai/sessions/2026-05-16-*-task-review-design.org=
  (filed at wrap-up of this session).
- Replaced subsection: =claude-templates/.ai/workflows/wrap-it-up.org=
  lines 177-213 (commit immediately preceding implementation).
- Existing patterns followed:
  =.ai/scripts/lint-org.el= (canonical-source rule, ERT layout),
  =scripts/tests/audit.bats= (bats test patterns).