languages/python/claude/rules/python-testing.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144

# Python Testing Rules

Applies to: `**/*.py`

Implements the core principles from `testing.md`. All rules there apply here —
this file covers Python-specific patterns.

## Framework: pytest (NEVER unittest)

Use `pytest` for all Python tests. Do not use `unittest.TestCase` unless
integrating with legacy code that requires it.

## Test Structure

Group tests in classes that mirror the source module:

```python
class TestCartService:
    """Tests for CartService."""

    @pytest.fixture
    def cart(self):
        return Cart(user_id=42)

    def test_add_item_normal(self, cart):
        """Normal: adding an in-stock item increases quantity."""
        cart.add("SKU-1", quantity=2)
        assert cart.item_count("SKU-1") == 2

    def test_add_item_boundary_zero_quantity(self, cart):
        """Boundary: quantity 0 is a no-op, not an error."""
        cart.add("SKU-1", quantity=0)
        assert cart.item_count("SKU-1") == 0

    def test_add_item_error_negative(self, cart):
        """Error: negative quantity raises ValueError."""
        with pytest.raises(ValueError, match="quantity must be non-negative"):
            cart.add("SKU-1", quantity=-1)
```

## Fixtures Over Factories

- Use `pytest` fixtures for test data setup
- Use `@pytest.fixture(autouse=True)` sparingly — prefer explicit injection
- Avoid `factory_boy` unless object graphs are genuinely complex
- Django: prefer pytest fixtures over `setUpTestData` unless you have a
  performance reason

## Parametrize for Category Coverage

Use `@pytest.mark.parametrize` to cover normal, boundary, and error cases
concisely instead of hand-writing near-duplicate tests:

```python
@pytest.mark.parametrize("quantity,valid", [
    (1, True),       # Normal
    (100, True),     # Normal: bulk
    (0, True),       # Boundary: zero is a no-op
    (-1, False),     # Error: negative
])
def test_add_item_quantity_validation(cart, quantity, valid):
    if valid:
        cart.add("SKU-1", quantity=quantity)
    else:
        with pytest.raises(ValueError):
            cart.add("SKU-1", quantity=quantity)
```

### Pairwise / Combinatorial for Parameter-Heavy Functions

When `@pytest.mark.parametrize` would require listing dozens of combinations
(feature flags × permissions × shipping × payment × etc.), switch to
combinatorial coverage via `/pairwise-tests`. The skill generates a minimal
matrix covering every 2-way parameter interaction — typically 80-99% fewer
cases than exhaustive, catching most combinatorial bugs.

Workflow: invoke `/pairwise-tests` → get a PICT model + generated test matrix
→ paste the matrix into a pytest parametrize block, or use the helper to
emit directly. The `pypict` package (`pip install pypict`) handles
generation in-process.

See `testing.md` § Combinatorial Coverage for the general rule and when
to skip.

## Measuring Coverage — `make coverage-summary`

The bundle ships a coverage summary at `.claude/scripts/coverage-summary.py`
and a Makefile fragment (`coverage-makefile.txt`) with `coverage` and
`coverage-summary` targets. After `make coverage` runs the suite under
coverage.py and writes a JSON report, `make coverage-summary` prints a
file-weighted project number and the source files no test imported.

The number to watch is that missing-file count. A module no test imports never
appears in coverage.py's report, so a line-weighted total skips it silently and
the suite looks healthier than it is. The summary counts every `*.py` under the
source dir that's absent from the report as 0%, so an untested module drags the
project number down where you can see it. It doesn't reimplement the per-file
table — `coverage report` already prints that. Copy the fragment's targets into
your own Makefile to adopt it; the bundle never edits your Makefile.

## Mocking Guidelines

### Mock these (external boundaries):
- External APIs (`requests`, `httpx`, `boto3` clients)
- Time (`freezegun` or `time-machine`)
- File uploads (Django: `SimpleUploadedFile`)
- Celery tasks (`@override_settings(CELERY_ALWAYS_EAGER=True)`)
- Email sending (Django: `django.core.mail.outbox`)

### Never mock these (internal domain):
- ORM internals (querysets, sessions, model internals)
- Model methods and properties
- Form and serializer validation
- Middleware
- Your own service functions

For domain services, use real model methods and validation — don't mock the
ORM. But a thin orchestration unit can be cleaner to test with a fake injected
at a deliberate data-access port (a repository or interface the code owns).
That's still mocking at a boundary, not at internals: replace the port you
defined, not the ORM's querysets, sessions, or model internals underneath it.

## Async Testing

Use `anyio` for async tests (not raw `asyncio`):

```python
@pytest.mark.anyio
async def test_process_order_async():
    result = await process_order_async(sample_order)
    assert result.status == "processed"
```

## Database Testing (Django)

- Mark database tests with `@pytest.mark.django_db`
- Use transactions for isolation (pytest-django default)
- Run ORM/query tests against a production-like database — the same engine
  as prod (a test Postgres or MySQL, often containerized). SQLite differs
  from Postgres/MySQL on query semantics, constraints, transactions, JSON
  columns, time zones, and indexes, so a test can pass on SQLite and fail
  in production. Reserve in-memory SQLite for pure unit tests that don't
  depend on database semantics.
- Use `select_related` / `prefetch_related` assertions to catch N+1 regressions