claude-rules/verification.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82

# Verification Before Completion

Applies to: `**/*`

## The Rule

Do not claim work is done without fresh verification evidence. Run the command, read the output, confirm it matches the claim, then — and only then — declare success.

This applies to every completion claim:
- "Tests pass" → Run the test suite. Read the output. Confirm all green.
- "Linter is clean" → Run the linter. Read the output. Confirm no warnings.
- "Build succeeds" → Run the build. Read the output. Confirm no errors.
- "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone.
- "No regressions" → Run the full test suite, not just the tests you added.

## What Fresh Means

- Run the verification command **now**, in the current session
- Do not rely on a previous run from before your changes
- Do not assume your changes didn't break something unrelated
- Do not extrapolate from partial output — read the whole result

## Red Flags

If you find yourself using these words, you haven't verified:

- "should" ("tests should pass")
- "probably" ("this probably works")
- "I believe" ("I believe the build is clean")
- "based on the changes" ("based on the changes, nothing should break")

Replace beliefs with evidence. Run the command.

## When You Cannot Verify

Sometimes the verification command cannot run: the tool is absent, there is no network, a sandbox blocks it, or the environment is missing a dependency. A check that did not run must never be reported as a pass. "Unable to verify" is an honest, required outcome — not silence, and not an optimistic "should work."

When a check cannot run, report it in this order:

1. **Command attempted** — the exact command you tried to run.
2. **Why it could not run** — tool absent, no network, sandbox limit, missing environment, etc.
3. **Risk left unverified** — what might be broken that you would not know about, given the check did not run.
4. **Next command to close the gap** — the smallest command the user can run to verify themselves.

Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user.

## Handing Off Manual Verification

Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file.

Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**:

- **Title** — descriptive, naming the behavior under test (not "test 1").
- **"What we're verifying:"** — one line on the intent, so a failed test explains itself.
- **Steps** — a bullet list, **one action per item**, in order. Concrete: the exact command to run, key to press, or file to open.
- **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous.

Format:

```
** TODO Manual testing and validation
*** <descriptive title>
What we're verifying: <one line>.
- <step one>
- <step two>
- <step three>
Expected: <the observable result after the last step>.
```

**Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything.

Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back.

## Before Committing

Before any commit:
1. Run the test suite — confirm all tests pass
2. Run the linter — confirm no new warnings
3. Run the type checker — confirm no new errors
4. Review the diff — confirm only intended changes are staged

Do not commit based on the assumption that nothing broke. Verify.