# Verification Before Completion

Applies to: `**/*`

## The Rule

Do not claim work is done without fresh verification evidence. Run the command, read the output, confirm it matches the claim, then — and only then — declare success.

This applies to every completion claim:
- "Tests pass" → Run the test suite. Read the output. Confirm all green.
- "Linter is clean" → Run the linter. Read the output. Confirm no warnings.
- "Build succeeds" → Run the build. Read the output. Confirm no errors.
- "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone.
- "No regressions" → Run the full test suite, not just the tests you added.

## What Fresh Means

- Run the verification command **now**, in the current session
- Do not rely on a previous run from before your changes
- Do not assume your changes didn't break something unrelated
- Do not extrapolate from partial output — read the whole result

## Red Flags

If you find yourself using these words, you haven't verified:

- "should" ("tests should pass")
- "probably" ("this probably works")
- "I believe" ("I believe the build is clean")
- "based on the changes" ("based on the changes, nothing should break")

Replace beliefs with evidence. Run the command.

## A Passing Gate Can Skip Your New File

A green check only counts for the files the check actually ran on. When a lint, test, or format gate runs against an explicit, hand-maintained file list rather than a glob or auto-discovery, a newly-added file is silently skipped until someone adds its path by hand. The gate reports clean while never looking at the new file — a false pass that reads exactly like a real one.

The failure mode isn't tool-specific. It fires anywhere a quality gate enumerates files instead of discovering them: a Makefile lint target listing paths, a pre-commit `files:` regex that doesn't match a new extension, a CI matrix with hardcoded paths, a coverage config with an explicit include list.

So when you add a file, confirm the gates see it. Check whether each lint/test/format gate discovers files automatically or enumerates them, and if it enumerates, add the new path before trusting the green check. "Linter is clean" means nothing for a file the linter never ran on.

## When You Cannot Verify

Sometimes the verification command cannot run: the tool is absent, there is no network, a sandbox blocks it, or the environment is missing a dependency. A check that did not run must never be reported as a pass. "Unable to verify" is an honest, required outcome — not silence, and not an optimistic "should work."

When a check cannot run, report it in this order:

1. **Command attempted** — the exact command you tried to run.
2. **Why it could not run** — tool absent, no network, sandbox limit, missing environment, etc.
3. **Risk left unverified** — what might be broken that you would not know about, given the check did not run.
4. **Next command to close the gap** — the smallest command the user can run to verify themselves.

Do not let an unverifiable check vanish into a confident summary. State it plainly and hand the gap to the user.

## Handing Off Manual Verification

Some checks can only be run by the user: interactive UI a script can't drive, a live external service, visual rendering (colors, layout, faces), a real device, or anything where the verification *is* a human looking at the result. When the gap needs the user's hands or eyes — not just a command they could paste — don't bury the steps in prose. Write them as a structured, runnable checklist in the project's task file.

Create (or append to) a single parent task named **"Manual testing and validation"** in the project's todo file (`todo.org`, or the project's equivalent). Under it, write **one org sub-header per test**:

- **Title** — descriptive, naming the behavior under test (not "test 1").
- **"What we're verifying:"** — one line on the intent, so a failed test explains itself.
- **Steps** — **one action per item**, in order. Manual actions (a key to press, a phone step, a file to open, the `M-x` command under test) are list bullets, stated concretely. A step that *is* code the user will execute goes in an org src block, not a bullet — see below.
- **"Expected:"** — the observable result after the last step. One outcome, stated plainly, so a mismatch is unambiguous.

**Executable steps go in src blocks.** When a step is code the user runs — verification setup, bug-reproduction steps, walkthrough wiring — put it in an org src block so the user can execute it in place (`C-c C-c`) and read the result in the buffer, rather than copy-pasting a bullet:

- Use the correct language tag (`emacs-lisp`, `sh`, `python`, …).
- Let results land in the buffer: `emacs-lisp` default-value results are fine; shell blocks get `:results output`.
- Group statements into logical blocks (setup, action, restore) so each block is one `C-c C-c`.
- Prose context, manual actions (phone steps, key presses, the UI actions under test), and `Expected:` lines stay as list items between the blocks.

Format:

```
** TODO Manual testing and validation
*** <descriptive title>
What we're verifying: <one line>.
- <manual action — a key press, a phone step>
#+begin_src emacs-lisp
(setup-form)
(action-form)
#+end_src
- <manual action between blocks>
Expected: <the observable result after the last step>.
```

**Promote on failure.** If the actual behavior matches Expected, the test passed — the user marks or deletes it. If it differs, the user writes the actual behavior and any notes under that test, flips its header to a `TODO`, and promotes it to a top-level task. The structured shape is what makes that one-step: the failing case already carries the repro and the expected result, so it becomes a tracked bug without rewriting anything.

Use this whenever the verification gap from "When You Cannot Verify" above is a human-in-the-loop check rather than a command the user can run blind. Write every such test you want run, not a representative sample — the user runs the checklist once and reports back.

## Before Committing

Before any commit:
1. Run the test suite — confirm all tests pass
2. Run the linter — confirm no new warnings
3. Run the type checker — confirm no new errors
4. Review the diff — confirm only intended changes are staged

Do not commit based on the assumption that nothing broke. Verify.