1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
|
# Verification Before Completion
Applies to: `**/*`
## The Rule
Do not claim work is done without fresh verification evidence. Run the command, read the output, confirm it matches the claim, then — and only then — declare success.
This applies to every completion claim:
- "Tests pass" → Run the test suite. Read the output. Confirm all green.
- "Linter is clean" → Run the linter. Read the output. Confirm no warnings.
- "Build succeeds" → Run the build. Read the output. Confirm no errors.
- "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone.
- "No regressions" → Run the full test suite, not just the tests you added.
## What Fresh Means
- Run the verification command **now**, in the current session
- Do not rely on a previous run from before your changes
- Do not assume your changes didn't break something unrelated
- Do not extrapolate from partial output — read the whole result
## Red Flags
If you find yourself using these words, you haven't verified:
- "should" ("tests should pass")
- "probably" ("this probably works")
- "I believe" ("I believe the build is clean")
- "based on the changes" ("based on the changes, nothing should break")
Replace beliefs with evidence. Run the command.
## When You Cannot Verify
Sometimes the verification command cannot run: the tool is absent, there is no network, a sandbox blocks it, or the environment is missing a dependency. A check that did not run must never be reported as a pass. "Unable to verify" is an honest, required outcome — not silence, and not an optimistic "should work."
When a check cannot run, report it in this order:
1. **Command attempted** — the exact command you tried to run.
2. **Why it could not run** — tool absent, no network, sandbox limit, missing environment, etc.
3. **Risk left unverified** — what might be broken that you would not know about, given the check did not run.
4. **Next command to close the gap** — the smallest command the user can run to verify themselves.
Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user.
## Handing Off Manual Verification
Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file.
Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**:
- **Title** — descriptive, naming the behavior under test (not "test 1").
- **"What we're verifying:"** — one line on the intent, so a failed test explains itself.
- **Steps** — a bullet list, **one action per item**, in order. Concrete: the exact command to run, key to press, or file to open.
- **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous.
Format:
```
** TODO Manual testing and validation
*** <descriptive title>
What we're verifying: <one line>.
- <step one>
- <step two>
- <step three>
Expected: <the observable result after the last step>.
```
**Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything.
Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back.
## Before Committing
Before any commit:
1. Run the test suite — confirm all tests pass
2. Run the linter — confirm no new warnings
3. Run the type checker — confirm no new errors
4. Review the diff — confirm only intended changes are staged
Do not commit based on the assumption that nothing broke. Verify.
|