From 234163dfed4aae1984889c03679eb62f7d8e42dc Mon Sep 17 00:00:00 2001
From: Craig Jennings <c@cjennings.net>
Date: Sun, 24 May 2026 00:49:53 -0500
Subject: docs(verification): add a manual-verification handoff format
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

When a verification gap needs the user's hands or eyes — interactive UI a script can't drive, a live service, visual rendering — describing the steps in prose isn't enough. I added a section to verification.md that says to write them as a "Manual testing and validation" task: one sub-header per test, each with a descriptive title, what we're verifying, the steps as a bullet list, and the expected result. If a test fails, the user writes the actual behavior, flips the header to a TODO, and promotes it, so a failed check becomes a tracked bug in one step.
---
 claude-rules/verification.md | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)
diff --git a/claude-rules/verification.md b/claude-rules/verification.md
index 673a54d..617c02c 100644
--- a/claude-rules/verification.md
+++ b/claude-rules/verification.md
@@ -44,6 +44,33 @@ When a check cannot run, report it in this order:
 
 Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user.
 
+## Handing Off Manual Verification
+
+Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file.
+
+Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**:
+
+- **Title** — descriptive, naming the behavior under test (not "test 1").
+- **"What we're verifying:"** — one line on the intent, so a failed test explains itself.
+- **Steps** — a bullet list, **one action per item**, in order. Concrete: the exact command to run, key to press, or file to open.
+- **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous.
+
+Format:
+
+```
+** TODO Manual testing and validation
+*** <descriptive title>
+What we're verifying: <one line>.
+- <step one>
+- <step two>
+- <step three>
+Expected: <the observable result after the last step>.
+```
+
+**Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything.
+
+Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back.
+
 ## Before Committing
 
 Before any commit:
-- 
cgit v1.2.3