1 files changed, 27 insertions, 0 deletions
diff --git a/claude-rules/verification.md b/claude-rules/verification.md
index 673a54d..617c02c 100644
--- a/claude-rules/verification.md
+++ b/claude-rules/verification.md
@@ -44,6 +44,33 @@ When a check cannot run, report it in this order:
 
 Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user.
 
+## Handing Off Manual Verification
+
+Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file.
+
+Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**:
+
+- **Title** — descriptive, naming the behavior under test (not "test 1").
+- **"What we're verifying:"** — one line on the intent, so a failed test explains itself.
+- **Steps** — a bullet list, **one action per item**, in order. Concrete: the exact command to run, key to press, or file to open.
+- **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous.
+
+Format:
+
+```
+** TODO Manual testing and validation
+*** <descriptive title>
+What we're verifying: <one line>.
+- <step one>
+- <step two>
+- <step three>
+Expected: <the observable result after the last step>.
+```
+
+**Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything.
+
+Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back.
+
 ## Before Committing
 
 Before any commit: