1 files changed, 20 insertions, 4 deletions
diff --git a/add-tests/SKILL.md b/add-tests/SKILL.md
index 45ab2ac..95e7a2e 100644
--- a/add-tests/SKILL.md
+++ b/add-tests/SKILL.md
@@ -8,6 +8,15 @@ Add test coverage to existing code by analyzing, proposing, writing, and reporti
 
 `/add-tests` — Analyze the entire project.
 
+## Core Principle — Refactor for Testability First
+
+If a function mixes business logic with UI, framework APIs, or interactive I/O, testing it cleanly requires heavy mocking — which tends to test the mock instead of the code. Split the function first:
+
+- **Pure helper** — All logic, explicit parameters, deterministic, no side effects. Takes real inputs, returns values or raises. 100% testable with zero mocking.
+- **Interactive wrapper** — Thin layer that gets input from user/framework/context and delegates to the helper. Not unit-tested — testing it would test the framework.
+
+Identify refactor-first candidates during Phase 1 and surface them in the Phase 2 proposal. The user decides per function: refactor now, skip testing it this round, or accept heavy mocking (not recommended).
+
 ## Instructions
 
 Work through four phases in order. Do not skip phases.
@@ -16,7 +25,7 @@ Work through four phases in order. Do not skip phases.
 
 1. **Explore the codebase thoroughly** before proposing any tests. Read the modules in scope, trace their data flows, and understand how components interact. Understand the code's intent — not just its structure — so tests target real behavior and real risks.
 2. Discover the test infrastructure: runner, config files, existing test directories, fixtures, helpers, and shared utilities. Check for `pytest.ini`, `pyproject.toml [tool.pytest]`, `jest.config.*`, `vitest.config.*`, or equivalent.
-3. Inventory source files in scope. For each file, determine: public function/method count, whether tests exist (full, partial, or none), and what external dependencies would need mocking.
+3. Inventory source files in scope. For each file, determine: public function/method count, whether tests exist (full, partial, or none), what external dependencies would need mocking, and whether any functions are **testability-blocked** — so tightly coupled to UI or framework that testing would require mocking the code under test itself. Flag blocked functions as refactor-first candidates for the Phase 2 proposal.
 4. Identify external boundaries that need mocks: API clients, database calls, file I/O, time/date, third-party services. Follow the mocking rules in `testing.md` — mock at external boundaries only, never mock code under test.
 5. Prioritize files by risk using the coverage targets from `testing.md`:
    - Business logic / domain: 90%+
@@ -35,8 +44,9 @@ Work through four phases in order. Do not skip phases.
    > - **Exhaustive**: Y hand-written cases covering every combination; only needed in regulated / provably-complete-coverage contexts."
 
    Pairwise is the pragmatic default. Exhaustive only when the user names a specific reason (regulatory, safety-critical, audit evidence). If the user doesn't pick explicitly, proceed with pairwise for that function and continue with category coverage for the rest.
-9. For large codebases, group the plan into batches (e.g., "Batch 1: core domain models, Batch 2: API endpoints").
-10. **Stop and ask the user for confirmation.** The user may add, remove, or reprioritize tests before you proceed. Do not write any test files until the user approves.
+9. **For each testability-blocked function**, present it separately in the proposal: "Function `<name>` is testability-blocked — refactor into pure helper + interactive wrapper before testing, skip testing it this round, or accept heavy mocking (not recommended)?" Decide per function.
+10. For large codebases, group the plan into batches (e.g., "Batch 1: core domain models, Batch 2: API endpoints").
+11. **Stop and ask the user for confirmation.** The user may add, remove, or reprioritize tests before you proceed. Do not write any test files until the user approves.
 
 ### Phase 3: Write
 
@@ -53,10 +63,16 @@ Work through four phases in order. Do not skip phases.
     - **TypeScript/JavaScript:** Follow `typescript-testing.md` — use Jest + React Testing Library, query priority (`getByRole` > `getByLabelText` > `getByText` > `getByTestId`), `waitFor` for async, type-safe test helpers in `tests/helpers/`.
 14. Mock only at external boundaries. Never mock the code under test, domain logic, or framework behavior.
 15. Run the tests after writing each batch. Confirm they pass. If a test fails against the current code, that is a discovered bug — do not delete the test or make it pass artificially. Note it for the report.
+16. **When a test fails, diagnose whether the test or the production code is wrong.** Don't reflexively "fix" either side.
+    - Read the test's intent (name, arrange, assertion).
+    - Read the production code path.
+    - Decide: if the expected behavior is reasonable and defensive but the code doesn't match, it's a production bug — fix the code, keep the test. If the test expectation itself is wrong (miscounted, misread the spec), fix the test. If the spec is unclear, stop and ask the user.
+    - Common patterns: null/nil input crashing where a graceful return was expected → production bug (add guard). Test asserted 10 replacements but the input had 9 trigger chars → test bug. Regex silently fails to match valid URLs → production bug (real defect surfaced by testing).
+    - Record production bugs discovered for the Phase 4 report. Tests that correctly fail against current code are bugs *found*, not failures to suppress.
 
 ### Phase 4: Report
 
-16. Summarize what was created:
+17. Summarize what was created:
     - Test files created (with paths)
     - Test count by category (normal / boundary / error)
     - Coverage change (if a coverage tool is configured and measurable)