# Verification Before Completion Applies to: `**/*` ## The Rule Do not claim work is done without fresh verification evidence. Run the command, read the output, confirm it matches the claim, then — and only then — declare success. This applies to every completion claim: - "Tests pass" → Run the test suite. Read the output. Confirm all green. - "Linter is clean" → Run the linter. Read the output. Confirm no warnings. - "Build succeeds" → Run the build. Read the output. Confirm no errors. - "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone. - "No regressions" → Run the full test suite, not just the tests you added. ## What Fresh Means - Run the verification command **now**, in the current session - Do not rely on a previous run from before your changes - Do not assume your changes didn't break something unrelated - Do not extrapolate from partial output — read the whole result ## Red Flags If you find yourself using these words, you haven't verified: - "should" ("tests should pass") - "probably" ("this probably works") - "I believe" ("I believe the build is clean") - "based on the changes" ("based on the changes, nothing should break") Replace beliefs with evidence. Run the command. ## A Passing Gate Can Skip Your New File A green check only counts for the files the check actually ran on. When a lint, test, or format gate runs against an explicit, hand-maintained file list rather than a glob or auto-discovery, a newly-added file is silently skipped until someone adds its path by hand. The gate reports clean while never looking at the new file — a false pass that reads exactly like a real one. The failure mode isn't tool-specific. It fires anywhere a quality gate enumerates files instead of discovering them: a Makefile lint target listing paths, a pre-commit `files:` regex that doesn't match a new extension, a CI matrix with hardcoded paths, a coverage config with an explicit include list. So when you add a file, confirm the gates see it. Check whether each lint/test/format gate discovers files automatically or enumerates them, and if it enumerates, add the new path before trusting the green check. "Linter is clean" means nothing for a file the linter never ran on. ## When You Cannot Verify Sometimes the verification command cannot run: the tool is absent, there is no network, a sandbox blocks it, or the environment is missing a dependency. A check that did not run must never be reported as a pass. "Unable to verify" is an honest, required outcome — not silence, and not an optimistic "should work." When a check cannot run, report it in this order: 1. **Command attempted** — the exact command you tried to run. 2. **Why it could not run** — tool absent, no network, sandbox limit, missing environment, etc. 3. **Risk left unverified** — what might be broken that you would not know about, given the check did not run. 4. **Next command to close the gap** — the smallest command the user can run to verify themselves. Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user. ## Handing Off Manual Verification Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file. Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**: - **Title** — descriptive, naming the behavior under test (not "test 1"). - **"What we're verifying:"** — one line on the intent, so a failed test explains itself. - **Steps** — **one action per item**, in order. Manual actions (a key to press, a phone step, a file to open, the `M-x` command under test) are list bullets, stated concretely. A step that *is* code the user will execute goes in an org src block, not a bullet — see below. - **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous. **Executable steps go in src blocks.** When a step is code the user runs — verification setup, bug-reproduction steps, walkthrough wiring — put it in an org src block so the user can execute it in place (`C-c C-c`) and read the result in the buffer, rather than copy-pasting a bullet: - Use the correct language tag (`emacs-lisp`, `sh`, `python`, …). - Let results land in the buffer: `emacs-lisp` default-value results are fine; shell blocks get `:results output`. - Group statements into logical blocks (setup, action, restore) so each block is one `C-c C-c`. - Prose context, manual actions (phone steps, key presses, the UI actions under test), and `Expected:` lines stay as list items between the blocks. Format: ``` ** TODO Manual testing and validation *** What we're verifying: . - #+begin_src emacs-lisp (setup-form) (action-form) #+end_src - Expected: . ``` **Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything. Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back. ## Before Committing Before any commit: 1. Run the test suite — confirm all tests pass 2. Run the linter — confirm no new warnings 3. Run the type checker — confirm no new errors 4. Review the diff — confirm only intended changes are staged Do not commit based on the assumption that nothing broke. Verify.