diff options
| author | Craig Jennings <c@cjennings.net> | 2026-05-30 13:55:05 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-05-30 13:55:05 -0500 |
| commit | 5bd759151d3ccf2d0a90f4b7de71e8c0e6e4a0a1 (patch) | |
| tree | d25e5300cfa05d272efb124af2dfd6470fed8e55 /.ai/workflows | |
| parent | 82e99ff8a4eb6d5aaba6ee02da3b5318a73b2125 (diff) | |
| download | rulesets-5bd759151d3ccf2d0a90f4b7de71e8c0e6e4a0a1.tar.gz rulesets-5bd759151d3ccf2d0a90f4b7de71e8c0e6e4a0a1.zip | |
feat(drill-deck): add authoring-quality checks and a card-authoring section
I researched spaced-repetition best practices (Wozniak's twenty rules, Matuschak's prompt-writing guide, Nielsen, the Anki and FSRS docs) and folded the findings into the drill-deck pipeline.
drill-deck-stats.py now checks authoring quality on top of structure. Two checks block: answer leakage (a question that echoes >= 80% of its own answer's content words tests recognition, not recall) and duplicate / near-duplicate fronts (confusable cards interfere). Three checks warn without blocking, surfacing rewrite candidates without failing the gate: overloaded backs, list-shaped backs, and binary yes/no prompts. The fuzzy thresholds live in constants at the top of the script, so a real deck that trips false positives can be tuned. I pulled the card-parsing into a parse_cards helper that captures each card's body, and added focused tests for every new helper plus CLI coverage of the leaky, duplicate, and notes-only cases.
drill-deck-review.org gains a Card Authoring Principles section (the why behind the canonical shapes, with sources), a person-card splitting path bounded by the :ID:-preservation rule, a Phase B cost-benefit-removal and leech-reformulation disposition, and a scheduling-is-Anki-side note so a future editor doesn't try to encode FSRS retention in the org source. I left out cloze cards (would need a second note type), per-card tractability targeting and retention encoding (Anki-side telemetry that never reaches the source), and on-face source-stamping (the converter strips those drawers by design). Each is noted with its reason.
Diffstat (limited to '.ai/workflows')
| -rw-r--r-- | .ai/workflows/drill-deck-review.org | 38 |
1 files changed, 36 insertions, 2 deletions
diff --git a/.ai/workflows/drill-deck-review.org b/.ai/workflows/drill-deck-review.org index 7e9eed5..fe12f3c 100644 --- a/.ai/workflows/drill-deck-review.org +++ b/.ai/workflows/drill-deck-review.org @@ -8,6 +8,8 @@ Take an org-drill flashcard file and bring it into the canonical shape — every The workflow has three substantive passes (question-form audit, content-accuracy audit, source rewrite) followed by a mechanical regenerate-and-place step. Content review is dispatched to a subagent because it's bounded research across project source-of-truth files; the structural rewrite stays in the main thread because it touches the SRS state we don't want to lose. Three helper scripts (=drill-deck-stats.py=, =drill-deck-diff-ids.py=, =drill-deck-sync=) automate the inventory, the safety check, and the regenerate-and-place. +*Scheduling lives on the Anki side.* Desired retention and the FSRS scheduling model are per-deck Anki options set on the phone, never controlled by the org source or =drill-to-anki.py=. The pipeline's only scheduling job is keeping each card's identity (the =:ID:=-derived GUID) stable so Anki's review history survives a rewrite. Don't try to encode retention, intervals, or org-drill's SM-2 state into the Anki output — the two schedulers are separate, and the import carries only card content plus identity. (Anki's desired-retention default is 90%; see [[https://docs.ankiweb.net/deck-options.html][the deck-options manual]].) + * When to Use This Workflow Trigger phrases: @@ -88,6 +90,8 @@ Format: "Who is X? Tell me about their Y." where X is a role descriptor that doe Note: pick a role descriptor that genuinely identifies one person. If multiple people share the role description, add a single distinguishing detail (e.g., "the one who works evenings", "the Vineti alum"). Don't pile on parentheticals. + Splitting: the person card deliberately trades atomicity for narrative recall — one card carries identity plus several attributes. When a body bundles genuinely unrelated attributes (role, employment history, limitations, scope) rather than one coherent topic, split it into multiple cards. One inherits the existing =:ID:= (and its SRS history); each new sibling starts fresh and will correctly show in =drill-deck-diff-ids.py= as an appeared ID. The criterion: split when the body reads as a list of separate facts, keep it whole when it reads as one story. (Minimum-information principle — Wozniak rule 4, Matuschak "Focused".) + *** Talking-points and directive cards Already in prompt form ("Introduce Yourself", "Spell out these orbital regime acronyms", "What is DeepSat?"). Leave the heading alone. Still strip the =*** Answer= sub-header and audit the body content for staleness. @@ -100,6 +104,26 @@ The =drill-deck-stats.py= helper recognizes both =?=-form and imperative-verb fo - *PROPERTIES drawer stays.* Org-drill needs the =:ID:=, =:DRILL_LAST_INTERVAL:=, =:DRILL_EASE:= etc. for SRS state. The Anki output strips it (see the script change). - *=SCHEDULED:= / =DEADLINE:= planning lines stay.* Same reason. The Anki output strips them. +* Card Authoring Principles + +The canonical shapes above are the house style; these are the reasons behind them, drawn from the spaced-repetition literature. =drill-deck-stats.py= checks the mechanical ones; the rest guide the rewrite and the content pass. + +- *One fact per card (minimum information principle).* A card should test a single retrievable connection. A back that bundles several independent facts gets partially recalled and burns repetitions on the parts you already know. When a body covers unrelated attributes, split it into separate cards. =drill-deck-stats.py= flags long backs as a non-blocking NOTE. + +- *Demand recall, not recognition (effortful retrieval).* Pulling the answer from memory is what strengthens it, so the question must not let you infer the answer from its own wording. This is why person headings never name the person, and why a question that restates its own answer is a defect. =drill-deck-stats.py= flags high front/back word overlap as answer leakage. + +- *Avoid binary prompts.* "Is X true?" and "A or B?" allow a coin-flip guess and produce shallow understanding. Reformulate open-ended — "How does X affect Y?" beats "Does X affect Y?" Flagged as a non-blocking NOTE. + +- *Avoid lists and enumerations.* Unordered sets past about five members, and long lists, recall poorly as a single card. Split the list across cards (overlapping cloze is the textbook alternative, but this pipeline has no cloze shape, so split instead). List-shaped backs are flagged as a non-blocking NOTE. + +- *Make cues precise.* A vague question admits several reasonable answers, so you can't tell whether you knew the intended one. Include enough context that only the intended answer fits, without narrowing into provincial trivia. + +- *Combat interference.* Confusable cards inhibit each other; two near-identical fronts are the worst case. Disambiguate them with distinguishing context, or merge them. =drill-deck-stats.py= flags duplicate / near-duplicate fronts. + +- *Understand before you memorize.* Cards are the last step, after the material is understood and structured. A card you can't explain is a leech waiting to happen. + +Sources: Wozniak's [[https://www.supermemo.com/en/blog/twenty-rules-of-formulating-knowledge][Twenty rules of formulating knowledge]], Andy Matuschak's [[https://andymatuschak.org/prompts/][How to write good prompts]], Michael Nielsen's [[https://augmentingcognition.com/ltm.html][Augmenting Long-term Memory]], and the [[https://docs.ankiweb.net/][Anki manual]]. + * Approach: Phases ** Phase A: Question-form + title audit (per card and per file) @@ -110,7 +134,7 @@ Run =drill-deck-stats.py= on the source first to get the structural inventory: .ai/scripts/drill-deck-stats.py <source.org> #+end_src -The script reports the deck title from =#+TITLE:= (and flags it if it contains source-tool jargon like "Org-Drill"), card count, PROPERTIES-drawer count, =*** Answer= sub-header count, cards missing =:ID:=, and cards whose heading is neither =?=-form nor an imperative-verb prompt. Each surfaced card is a candidate for the rewrite, plus the title itself if flagged. +The script reports the deck title from =#+TITLE:= (and flags it if it contains source-tool jargon like "Org-Drill"), card count, PROPERTIES-drawer count, =*** Answer= sub-header count, cards missing =:ID:=, and cards whose heading is neither =?=-form nor an imperative-verb prompt. It also flags possible answer leakage and duplicate / near-duplicate fronts (both blocking), and surfaces non-blocking NOTEs for overloaded, list-shaped, or binary cards. Each surfaced card is a candidate for the rewrite, plus the title itself if flagged. For each candidate, propose the new heading in advance so Phase C is mechanical. For person cards, the proposal is the role descriptor + topical anchor pair. For acronym/concept cards, the proposal is the existing body question promoted to the heading. @@ -136,6 +160,7 @@ Categories to look for: - Project facts: milestone shifts, submission states, exercise / demo dates - External contacts: title or affiliation changes - Company facts: head count, funding, customer status +- Removable cards: trivia not worth memorizing, or a fact whose underlying source no longer appears in any source-of-truth doc (flag as a deletion candidate, not a rewrite) Skip cards where you find no staleness. Cap at 2,000 words. #+end_example @@ -144,6 +169,8 @@ Include any user-supplied seed fixes in the dispatch (e.g., "Vrezh is now full-t Output of Phase B: a structured per-card list of content updates with confidence levels. High-confidence findings get baked in during Phase C. Medium-confidence findings are reviewed inline before baking. Low-confidence findings are surfaced but skipped unless the user calls them in. +*Removal and leeches.* Two dispositions beyond rewrite. (1) Cost-benefit removal: a card flagged as removable is a deletion candidate — weigh whether the fact clears a "worth memorizing" bar before keeping it. (2) Leech feedback: when Anki suspends a card as a leech (8 lapses by default), the card's formulation is the problem, not the review effort; route it back through Phase B/C as a reformulation target, preserving its =:ID:= so Anki keeps the lapse history. The org → Anki flow is one-directional: leech tags, lapse counts, and per-card success rates live in Anki and never flow back to the source, so these signals are carried in by hand. (Anki [[https://docs.ankiweb.net/leeches.html][leech]] guidance is "reformulate, don't grind".) + ** Phase C: Source rewrite Take Phase A's question-rewrite plan and Phase B's content-update list, apply them to the source file. Preserve every card's =:PROPERTIES:= drawer (especially =:ID:=) and =SCHEDULED:= line verbatim — those carry SRS state that must survive the rewrite. @@ -217,10 +244,12 @@ The core converter. Reads an org-drill source file, emits a stable-ID Anki =.apk ** =drill-deck-stats.py= -Inventory + workflow-violation warnings for a single deck source. Counts cards, PROPERTIES drawers, =*** Answer= sub-headers, cards missing =:ID:=, and cards whose heading is neither =?=-form nor an imperative-verb prompt. Exits 0 when clean, 1 when warnings present, so it gates =drill-deck-sync=. +Inventory + authoring-quality checks for a single deck source. Counts cards, PROPERTIES drawers, =*** Answer= sub-headers, cards missing =:ID:=, and cards whose heading is neither =?=-form nor an imperative-verb prompt. It also checks authoring quality: answer leakage (front/back content-word overlap) and duplicate / near-duplicate fronts are blocking WARNs; overloaded backs, list-shaped backs, and binary prompts are non-blocking NOTEs. Exits 0 when no blocking warning is present, 1 otherwise, so it gates =drill-deck-sync=. Imperative-verb allowlist: Spell, Describe, Explain, Name, List, Give, Show, Tell, Define, Compare, Identify, Outline, Introduce, Walk, State, Recite, Recall, Summarize. +The fuzzy checks (leakage ratio, overloaded word count) are tuned by the =LEAKAGE_*= and =BACK_WORD_LIMIT= constants at the top of the script. Loosen them if a real deck trips false positives. + ** =drill-deck-diff-ids.py= SRS-state preservation check between two versions of a deck. Extracts every =:ID:= from each, reports IDs that disappeared (lost SRS state — worst-case bug) or appeared (new cards). Exits 0 when clean, 1 when any disappeared/appeared. @@ -264,6 +293,8 @@ If you find the script doing something else, update the script before regenerati 5. *Skipping the content-accuracy pass.* The structural rewrite alone leaves stale facts in place. The drill cards become a memorization tool for the wrong information. 6. *Treating subagent output as gospel.* Medium- and low-confidence findings need human review before baking. The subagent surfaces; the main thread decides. 7. *Running =drill-deck-sync= without =--diff-against=.* The stats check still runs, but the SRS-state preservation check doesn't. On a rewrite of any size, pass =--diff-against /tmp/<name>-prerewrite.org= (grab from git first). +8. *Answer leakage.* A question that restates its own answer tests recognition, not recall — the card looks learned when it isn't. =drill-deck-stats.py= flags high front/back word overlap. +9. *Encoding scheduling in the source.* Retention, intervals, and FSRS state are Anki-side options; the org files and =drill-to-anki.py= carry only card content plus identity. See the scheduling note in the Overview. * Living Document @@ -284,3 +315,6 @@ After the first run, scripted the safety-net checks into three helpers: =drill-d *** 2026-05-30: Title-audit added (same day) Craig noticed the Anki deck name still showed as "DeepSat Org-Drill Flashcards" because the source =#+TITLE:= leaks tool-name jargon into Anki. Added a "Deck title" subsection under Canonical Card Shape, expanded Phase A to audit the title, and extended =drill-deck-stats.py= to flag any title matching =org[-\s]?drill= (case-insensitive). Stable-ID caveat documented: renaming the deck changes the Anki deck ID, so the next import lands as a new deck and the old one needs deleting from Anki. + +*** 2026-05-30: Authoring-quality checks + Card Authoring section (same day) +Researched flashcard / spaced-repetition best practices (Wozniak's twenty rules, Matuschak's prompt-writing guide, Nielsen, the Anki manual, the FSRS docs) and folded the findings in. =drill-deck-stats.py= gained answer-leakage and duplicate-front checks (blocking), plus non-blocking NOTEs for overloaded backs, list-shaped backs, and binary prompts. Added a "Card Authoring Principles" section (the why behind the canonical shapes), a person-card splitting path, a Phase B cost-benefit-removal + leech-feedback disposition, and a scheduling-is-Anki-side note in the Overview. Deliberately not adopted, with reasons: cloze cards (would need a second note type and an authoring convention), per-card tractability targeting and FSRS-retention encoding (Anki-side telemetry that never flows back to the source), on-face source-stamping (the converter strips those drawers by design; provenance stays in the org layer). |
