# Verification Before Completion Applies to: `**/*` ## The Rule Do not claim work is done without fresh verification evidence. Run the command, read the output, confirm it matches the claim, then — and only then — declare success. This applies to every completion claim: - "Tests pass" → Run the test suite. Read the output. Confirm all green. - "Linter is clean" → Run the linter. Read the output. Confirm no warnings. - "Build succeeds" → Run the build. Read the output. Confirm no errors. - "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone. - "No regressions" → Run the full test suite, not just the tests you added. ## What Fresh Means - Run the verification command **now**, in the current session - Do not rely on a previous run from before your changes - Do not assume your changes didn't break something unrelated - Do not extrapolate from partial output — read the whole result ## Red Flags If you find yourself using these words, you haven't verified: - "should" ("tests should pass") - "probably" ("this probably works") - "I believe" ("I believe the build is clean") - "based on the changes" ("based on the changes, nothing should break") Replace beliefs with evidence. Run the command. ## When You Cannot Verify Sometimes the verification command cannot run: the tool is absent, there is no network, a sandbox blocks it, or the environment is missing a dependency. A check that did not run must never be reported as a pass. "Unable to verify" is an honest, required outcome — not silence, and not an optimistic "should work." When a check cannot run, report it in this order: 1. **Command attempted** — the exact command you tried to run. 2. **Why it could not run** — tool absent, no network, sandbox limit, missing environment, etc. 3. **Risk left unverified** — what might be broken that you would not know about, given the check did not run. 4. **Next command to close the gap** — the smallest command the user can run to verify themselves. Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user. ## Handing Off Manual Verification Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file. Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**: - **Title** — descriptive, naming the behavior under test (not "test 1"). - **"What we're verifying:"** — one line on the intent, so a failed test explains itself. - **Steps** — a bullet list, **one action per item**, in order. Concrete: the exact command to run, key to press, or file to open. - **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous. Format: ``` ** TODO Manual testing and validation *** What we're verifying: . - - - Expected: . ``` **Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything. Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back. ## Before Committing Before any commit: 1. Run the test suite — confirm all tests pass 2. Run the linter — confirm no new warnings 3. Run the type checker — confirm no new errors 4. Review the diff — confirm only intended changes are staged Do not commit based on the assumption that nothing broke. Verify.