review-code/SKILL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418

---
name: review-code
description: Review code changes against engineering standards. Accepts a PR number, a SHA range (BASE..HEAD), the current branch's diff against main, staged changes, or a described scope ("the last 3 commits"). Audits CLAUDE.md adherence (reads root + per-directory CLAUDE.md), intent-vs-delivery (when given a plan/ADR/ticket), security, testing (TDD evidence + three-category coverage), conventions (conventional commits + no AI attribution), root-cause discipline, architecture (layering + scope), API contracts. Produces a structured report — Strengths, per-criterion PASS/WARN/FAIL, per-issue Critical/Important/Minor severity — ending with an explicit verdict (Approve / Request Changes / Needs Discussion) plus 1-2 sentence reasoning. Self-filters low-confidence findings; never flags pre-existing issues, lint/typecheck issues (CI handles those), or changes on unmodified lines. Use before merging a PR, before pushing a branch, or when reviewing a teammate's work. Do NOT use for proposing features (use brainstorm or arch-design), drafting implementation (use start-work or add-tests), standalone security audits (use security-check), or narrow style-only checks (a linter handles those).
---

# Review Code

Review code changes against engineering standards. Produce a structured report with strengths, per-criterion audit, severity-tagged issues, and an explicit verdict.

## Intent

Code review is a coaching channel. Every finding is a teaching opportunity — the engineer reading the review should come away with a better mental model of the failure pattern, not just a list of patches to apply. The structured report below is one half of the work; the inline pins and posted summary on the PR are where the coaching actually lands.

Reviews that block without teaching are incomplete. Reviews that teach without honest assessment are flattery. The aim is both: honest about the gaps, generous about what's right, and specific enough that the engineer can apply the lesson to the next PR.

## Usage

Point the skill at code being reviewed in any of these ways:

- `/review-code [PR_NUMBER]` — fetch the PR diff via `gh pr view` + `gh pr diff`
- `/review-code BASE_SHA..HEAD_SHA` — any git range
- `/review-code` (no args) — the current branch's diff against its merge base with `main`
- `/review-code --staged` — only staged changes (pre-commit scrutiny)
- Or describe it: "review my changes in the current branch" / "review the last 3 commits"

Optionally, provide intent context for delivery grading:

- `plan=docs/design/<feature>.md`
- `adr=docs/adr/<NNNN>-<title>.md`
- `ticket=LINEAR-<id>` (fetches the ticket for comparison)

When intent context is given, the review grades "does this match what was asked?" on top of code hygiene. When not, that section is marked N/A.

## Execution Model

For substantive reviews on large diffs: **dispatch the perspective passes as parallel sub-agents** via the Agent tool. Each sub-agent starts with a clean context window — the reviewer shouldn't inherit the implementer's mental model. For small single-commit tweaks, run inline.

## Phase 0 — Eligibility Gate

Before reviewing, confirm the review is worth running. Skip with a short note if any apply:

- PR is closed or merged (reviewing merged code is after-the-fact observation, not a gate)
- PR is a draft not marked ready-for-review
- Change is an automated dep bump (dependabot, renovate) — trust the bot + CI
- Change is trivial (whitespace-only, typo-only, revert with obvious justification)
- A `/review-code` report already exists for this SHA range (no new commits since)

If skipping, report: "Skipped — <reason>" and stop.

## Phase 1 — Gather Context

1. **Resolve the input to a concrete diff:**
   - PR: `gh pr view <n> --json title,body,baseRefName,headRefName,files` + `gh pr diff <n>`
   - SHA range: `git diff <base>..<head>` + `git log <base>..<head> --format='%h %s%n%b'`
   - Current branch: `git merge-base HEAD main` → diff from there
   - `--staged`: `git diff --cached`

2. **Collect CLAUDE.md files to audit against:**
   - Root project `CLAUDE.md` if present
   - Any `CLAUDE.md` in directories whose files the diff modified
   - Their paths go into the audit; their content guides the CLAUDE.md adherence criterion

3. **If intent context was provided**, fetch and read it. Extract: stated goal, scope, non-goals, acceptance criteria, linked ADRs.

4. **Scope summary** — record for the final report: file count, added/removed lines, commit count, touched modules.

## Phase 2 — Multi-Perspective Pass

Each perspective is a distinct review angle. For substantial changes, dispatch them as parallel sub-agents. For small changes, run sequentially inline.

Follow `subagents.md` for the dispatch contract — each perspective's prompt needs explicit scope, pasted diff context, constraints (don't flag unmodified lines, don't rewrite the PR), and a required output format. Perspectives are read-only and independent, so parallel fan-out is always safe here.

### Perspective A — CLAUDE.md Adherence

Read the CLAUDE.md files collected in Phase 1. For each rule they state, check whether the diff honors it. CLAUDE.md is writing guidance, so not every rule applies at review time — focus on rules that are *assertions about what committed code should look like*.

Example: if CLAUDE.md says "prefer `cast()` over `# type: ignore`," flag any `# type: ignore` in the diff. If it says "always run `make validate-parens` before commit," that's process guidance, not a reviewable code attribute.

### Perspective B — Shallow Bug Scan

Read the file changes in the diff. Ignore context beyond the changes themselves. Focus on **large, obvious bugs**:

- Null / undefined dereferences on values the code can't guarantee
- Off-by-one errors in boundary conditions
- Incorrect error handling (swallowing exceptions, returning success on failure paths)
- Data mutation under concurrent access without coordination
- Incorrect SQL / query shape (wrong join, missing where clause)
- Obvious crashes (`raise` without arguments, typed nil, etc.)

Avoid small issues and nitpicks; those are for the Minor severity tier, if worth mentioning at all.

### Perspective C — Git History Context

Run `git blame` on the modified lines. Read recent commits that touched the same files. Check:

- Is this change contradicting a recent deliberate choice? (Someone just fixed X; now this PR re-introduces it.)
- Is there a pattern of the same bug being fixed repeatedly here?
- Is the surrounding code style / convention consistent with what's being added?

### Perspective D — Prior PR Comments

`gh pr list` the PRs that previously touched these files. Check their review comments. If prior reviewers flagged a pattern, check whether the current diff repeats it.

### Perspective E — Code Comments In Scope

Read comments in the modified files (both touched and nearby). If the code has guidance ("DO NOT call this before X is initialized"), check the diff complies.

## Phase 3 — Criteria Audit

For each criterion below, report **PASS / WARN / FAIL** with file:line references. WARN and FAIL findings become issues in the Phase 4 summary.

### Intent vs Delivery (when intent context provided)

- Does the implementation match the plan / ADR / ticket?
- Are acceptance criteria demonstrably met?
- Scope creep: changes not in the plan?
- Missing requirements: plan items unimplemented?

Skip if no intent context; note "not evaluated" in the report.

### Security

- No hardcoded secrets, tokens, API keys, or credentials
- All user input validated at system boundary
- Parameterized queries only (no SQL string concatenation)
- No sensitive data logged (PII, tokens, passwords)
- Dependencies pinned and auditable

### Testing (TDD Evidence)

- Tests exist for all new code
- Test commits **precede** implementation commits (TDD workflow)
- Three categories covered: Normal, Boundary, Error per function; thorough on edge cases
- Tests are independent — no shared mutable state
- Mocking at external boundaries only (network, file I/O, time) — domain logic tested directly
- Test naming follows project convention
- Coverage does not decrease without justification
- Parameter-heavy functions: author considered pairwise coverage via `/pairwise-tests`? (Not required; worth noting if missing.)

### Conventions

- Type annotations on all functions (including return types)
- Conventional commit messages (`feat:`, `fix:`, `chore:`, etc.)
- **No AI attribution anywhere** — code, comments, commits, PR descriptions — see `commits.md`
- One logical change per commit
- Docstrings on public functions/classes

### Root Cause & Thoroughness

- Bug fixes address the root cause, not surface symptoms
- Changes demonstrate understanding of surrounding code
- Edge cases covered comprehensively, not just the happy path

### Architecture

- Request handlers thin; business logic in services/domain
- No unnecessary abstractions or over-engineering
- Changes scoped to what was asked (no drive-by refactoring)
- Stated architecture respected — if `.architecture/brief.md` exists, check conformance (see `arch-evaluate`)

### API Contracts

- New endpoints have typed contracts or schemas defined
- No raw dict/object responses bypassing the contract layer
- Client-side types match server-side output
- Data flows through the API layer, not direct data access from handlers

## Phase 4 — Filter and Categorize

### Confidence Filter

For each issue surfaced by any perspective or criterion, self-scrutinize: **am I really sure this is a real issue, or could I be wrong?** Rate your confidence honestly:

- **High** — verified by reading the code; matches a pattern that causes bugs in practice; or a direct CLAUDE.md violation you can cite
- **Medium** — likely real but contingent on context you didn't fully verify
- **Low** — looks off but you can't confirm; might be a false positive

**Drop Low-confidence issues before the final report.** Medium and High issues appear; Medium ones noted as such where uncertainty is relevant.

### False-Positive Filter

Do **not** flag any of these as issues:

- **Pre-existing issues** on unmodified lines — note separately as "for follow-up," don't block this PR
- **Lint / typecheck / test failures** — CI handles those; don't run builds yourself. This applies to *reviewing* a diff, not to shipping one: a pre-commit or pre-push flow still owes the local verification `verification.md` requires (run the suite, or state "not run because..."). Reading a PR doesn't duplicate CI; producing a PR doesn't get to skip it.
- **Nitpicks a senior engineer wouldn't call out** — unless CLAUDE.md explicitly requires them
- **Style issues** — formatters and linters handle these
- **Issues explicitly silenced in code** (e.g., `# type: ignore[...]` with a reason, lint ignore comments) unless the silencing is unjustified
- **Intentional changes** in functionality clearly related to the PR's stated goal
- **Changes in unmodified lines** (real issues in files the PR touches but on lines it doesn't change)
- **Framework behavior being tested** — see `testing.md` anti-patterns

### Severity Categorization

Remaining issues get tagged:

- **Critical** — merge-blockers: security holes, broken functionality, data loss risk, missing acceptance criteria, AI attribution in committed content
- **Important** — should fix: architecture problems, test gaps, error handling holes, pattern violations
- **Minor** — nice to have: missing docstrings, docstring drift, small optimizations

## Phase 5 — Output

Before printing any approve/request-changes summary for posting, run the praise/correction gate (see Posted Summary Voice, and `/voice` personal #40): scan the summary and cut every clause that describes or justifies a good change — keep praise plus verdict only. Then confirm each finding and change-request states its why, gently and briefly. This is a mechanical pass, not a judgment call.

```markdown
# Code Review — <PR title / branch name / SHA range>

**Scope:** <N files, +X / -Y lines, Z commits>
**Input:** <pr#N / base..head / current branch>
**Intent context:** <plan/ADR/ticket link, or "none provided">
**CLAUDE.md files audited:** <list of paths>

## Strengths

Up to three, each specific and with file:line. Strengths name what to repeat — they're as much coaching as the issues list. Call out the things the engineer did well that you want them doing on the next PR (good test structure, clean separation of concerns, root-cause fix rather than symptom patch). On a tiny diff, one or two honest strengths beat three padded ones; on a genuinely weak PR, "nothing notable here" is a fair finding — say it rather than manufacture praise.

- <Specific positive>
- <Specific positive>
- <Specific positive>

## Per-Criterion Audit

| Criterion | Status | Notes |
|---|---|---|
| CLAUDE.md Adherence | PASS / WARN / FAIL | ... |
| Intent vs Delivery | PASS / WARN / FAIL / N/A | ... |
| Security | PASS / WARN / FAIL | ... |
| Testing (TDD Evidence) | PASS / WARN / FAIL | ... |
| Conventions | PASS / WARN / FAIL | ... |
| Root Cause & Thoroughness | PASS / WARN / FAIL | ... |
| Architecture | PASS / WARN / FAIL | ... |
| API Contracts | PASS / WARN / FAIL | ... |

## Issues

### Critical (must fix before merge)

1. **<Title>**
   - File: `<path>:<line>`
   - Problem: <what's wrong>
   - Why it matters: <impact + generalizable principle>
   - Fix: <concrete suggestion>

### Important (should fix)

1. **<Title>**
   - File: `<path>:<line>`
   - Problem: <what's wrong>
   - Why it matters: <impact + generalizable principle>
   - Fix: <concrete suggestion>

### Minor (nice to have)

1. **<Title>** — `<file>:<line>`

## Recommendations

Meta-level suggestions: process changes, follow-up tickets, architectural drift observations.

## Verdict

**<Approve / Request Changes / Needs Discussion>**

**Reasoning:** <1-2 sentences. Grounded in the audit. For Request Changes, name what would clear the verdict so the path to approval is visible.>
```

## Critical Rules

**DO:**
- Categorize issues by actual severity — not everything is Critical
- Cite specifics — `file:line`, not vague prose
- Explain why each issue matters, not just what it is
- Acknowledge real strengths — up to three, specific; say none found when a tiny or weak diff doesn't warrant them
- Give a clear verdict with reasoning
- Trust CI for lint, typecheck, test runs; don't re-run them
- Self-filter low-confidence findings before reporting
- Treat each finding as a coaching opportunity — the engineer reading should come away with a better mental model, not just a patch list
- Frame Request Changes as a path to approval — name what would clear it, not just what's broken

**DON'T:**
- Say "looks good" without citing what was checked
- Mark style nitpicks as Critical
- Give feedback on code you didn't actually read
- Flag pre-existing issues as PR-blockers
- Use emojis or marketing-adjacent language
- Skip the verdict or hedge it
- Add any AI attribution to the output

## Example Output

```markdown
# Code Review — feat: add inventory export CSV (#447)

**Scope:** 8 files, +312 / -24 lines, 5 commits
**Input:** PR #447
**Intent context:** docs/design/inventory-export.md
**CLAUDE.md files audited:** CLAUDE.md, api/CLAUDE.md

## Strengths

- Characterization tests added before refactor (`tests/test_inventory_export.py:12-89`) — TDD evidence clear across commit order
- Export batches via generators rather than loading full inventory (`inventory/export.py:42-78`) — handles the 50k-row case without OOM
- API schema versioned from day one (`api/schemas/export.py:5`) — cleaner than most first-endpoints

## Per-Criterion Audit

| Criterion | Status | Notes |
|---|---|---|
| CLAUDE.md Adherence | PASS | No `# type: ignore` without justification; conventional commits clean |
| Intent vs Delivery | PASS | Matches plan §3.2; acceptance criteria met |
| Security | WARN | User-provided filename not sanitized (see Important #1) |
| Testing (TDD Evidence) | PASS | 5 test commits precede 3 implementation commits |
| Conventions | PASS | All conventional, no AI attribution detected |
| Root Cause & Thoroughness | PASS | Empty inventory, Unicode, large batches all covered |
| Architecture | PASS | Handler thin; logic in inventory service |
| API Contracts | PASS | Schema defined; TS types match in client/ |

## Issues

### Critical

None.

### Important

1. **Filename injection via user input**
   - File: `api/views/inventory.py:42`
   - Problem: User-provided `filename` passed directly to `Content-Disposition` header
   - Why it matters: CRLF injection → header smuggling; also sets up a path-traversal risk if ever used for disk writes
   - Fix: `secure_filename(user_filename)` (werkzeug) or regex-strip to `[A-Za-z0-9._-]+`

### Minor

1. **Missing type on private helper** — `inventory/export.py:22` — `_chunked()` return type unspecified
2. **Docstring drift** — `inventory/service.py:104` — docstring describes pre-batch behavior

## Recommendations

- Consider `/pairwise-tests` on the `export_options` function (4 parameters × 2-3 values each); currently 3 hand-written tests cover a fraction of the combinatorial space

## Verdict

**Request Changes**

**Reasoning:** Core implementation is strong — TDD evidence, thin handler, proper schema, good edge coverage. The filename-injection issue is a must-fix before merge; once resolved this is a clean approve.
```

## Anti-Patterns

- **All-caps everything.** If everything is Critical, nothing is. Be honest about severity.
- **Skipping strengths out of laziness.** On a substantive diff, if you found nothing good you didn't read carefully — name up to three specifics. The exception is a tiny or genuinely weak diff, where an honest "nothing notable" is the right call and padding would be synthetic praise.
- **Verdict without reasoning.** "Request Changes" alone is unhelpful. One or two sentences on why.
- **Criteria skipped silently.** If you didn't check API contracts, say "N/A — no API changes in this diff," not silence.
- **Reviewing code you didn't read.** Don't feedback on files you skimmed.
- **Running the build yourself.** CI does that. Don't re-verify what CI is for.
- **Hedging ("might be an issue").** If you can't commit to it, it's Low-confidence — drop it.
- **Blocking without teaching.** A Request Changes verdict without coaching is incomplete. Name the principle behind the fix, not just the fix.
- **Coaching without honesty.** Inflating Strengths to soften a rough review reads as flattery. Honest assessment with generous framing beats puffed-up praise.

## Hand-Off

- **Critical** → must be addressed before merge; author fixes, re-review via `/review-code` on the updated SHA
- **Important** → fix, or deliberately defer with an ADR (run `/arch-decide`)
- **Minor** → follow-up issues or a cleanup PR
- **Intent-vs-Delivery gaps** → either file tickets for the missing pieces or update the plan to reflect reality

## Posted Summary Voice

The summary body and the inline pins work as a pair: scannable verdict on top, full coaching conversation in the pins. Read this section paired with Inline Comment Voice below — the summary is terse precisely because the inlines carry the teaching weight.

The structured report above stays local. When the verdict is posted as a GitHub review (per `commits.md` Step 2 Shape 1), keep the summary body terse — one long sentence or a few short ones is plenty. Vary the phrasing run-to-run so consecutive reviews don't read templated. Voice: an encouraging senior dev who doesn't like to talk; positive feedback is short, blunt, and lands cleanly.

Name the good thing and stop: do not explain *why* it's good. The author made the change and already knows the rationale, so justifying the praise reads as sycophantic. "Clean migration off the window globals, tests cover the right edges" lands; appending "...because there are no stragglers and the provider, mocks, and Normal/Boundary/Error cases are all covered" turns a compliment into padding. Elaboration is for findings (something is wrong, here's the failure mode and the fix), never for compliments.

This holds for re-review approvals too. A re-review confirming requested changes is just "Approving." Mechanical rule: an approve summary is the verdict plus at most a bare positive ("Clean.", "Solid fix."). It must contain no clause that says what the change does or why it works. "The hoist to App fixes the crash, and the new test locks it in" is the banned pattern — it describes and justifies the change on an approve. If a clause references the code's behavior, cut it.

The asymmetry: praise gets no why, but a finding, change-request, or inline coaching note *always* gets the why. Behavior only changes when the reason lands, so a correction that just says what to fix without saying why teaches nothing. Deliver that why gently and briefly, the way a kind coach would, never as a verdict from on high. The praise-strips / correction-explains split is enforced as `/voice` personal pattern #40, which every posted review summary passes through.

Good:
- "Nice, clean, good coverage. One small naming point inline. Approving."
- "Clean shape, tests cover the right edges. Approving."
- "Solid. One blocker inline — see the auth gap. Request changes."

Bad (chatty, padded, marketing-adjacent):
- "Great work overall! This is a really clean addition. The OneToOne relationship behaves as expected, the migration is correctly dependent on 0028, CI is green across all backend/frontend checks, and the tests cover Normal/Boundary/Error cases. One small naming nit inline — fine to roll into a follow-up."

If specific praise lands somewhere, surface it as a single inline comment on the relevant line, not in the summary body. The summary stays scannable; the inline pins carry the specifics.

## Inline Comment Voice

Inline pins are where the explanation lives. Write them in complete sentences and natural prose — the way a senior dev mentoring a less-senior teammate would, not as compressed lint warnings or telegraphic fragments. State three things: the problem, why it matters, and the fix — usually one to two sentences. The why-it-matters is not optional and is not the same as the mechanism: say what actually breaks and who it hurts (a user hits a 500, a screen reader announces nothing, the next caller silently gets a wrong default), not just what the code does. Keep it brief, but brevity never costs the stakes. Don't pad to a sentence quota, and skip the obligatory "the general lesson:" closer and the "when it would surface" elaboration unless that detail is itself substantive. Plain language; no jargon strings like "raises X -> 500 in case Y." (Craig, 2026-05-25: trimmed inline length twice, then flagged that the why-it-matters must stay in the short space — substance over length, stakes over mechanism.)

The summary body delivers the verdict in one line. Every inline carries its own why-it-matters, briefly. It should leave the reader understanding what breaks, why it matters, and the fix, without padding.

Good (natural, complete sentences, coaches the reader):

> This `next(...)` call has no default, so it raises `StopIteration` and surfaces as a 500 if no matching frame is found. The `is_complete` gate keeps that unreachable today, but `next(..., None)` plus an explicit raise would harden it against `frames.json` drifting out of sync with `MAX_FRAMES`.

Bad (compressed, mashed jargon, no coaching):

> `next(...)` without a default raises StopIteration -> 500 if frames.json ever drifts out of sync with MAX_FRAMES. The is_complete gate makes that unreachable in normal flow, so just a heads-up for a follow-up — `next(..., None)` with a clear raise/return would land cleanly.

Bad (brief, but drops the why-it-matters — states mechanism and fix, never the stakes):

> `next(...)` has no default. The is_complete gate keeps that unreachable today, but `next(..., None)` plus an explicit raise would harden it.

The fix above tells the reader what to change without ever saying a user would hit a 500. Brevity is fine; dropping the stakes is not. The Good version is just as short and still names the impact.

## Content scope

Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.

This skill audits `CLAUDE.md` adherence, which creates a tension: a finding may be grounded in a `CLAUDE.md` rule, but `CLAUDE.md` is personal/project tooling a teammate can't always reach. Resolve it by where the review lands:

- **Private or internal review** (local report, your own repo, output not posted to a shared PR) — cite `CLAUDE.md` directly. "`CLAUDE.md` requires `cast()` over `# type: ignore`; this diff adds three."
- **Public or team review** (posted to a team-visible PR) — translate the rule into the engineering reason it encodes, and don't name the rules file. "Prefer `cast()` here so the type stays checked rather than silenced" beats "violates `CLAUDE.md`." The teammate can act on the reason; they can't act on a file they don't have.

The underlying rule is the same one `commits.md` states for citing personal tooling in public artifacts: surface the reason, not the rule's source.