aboutsummaryrefslogtreecommitdiff
path: root/.claude/commands/start-work.md
blob: d14662236cb706111ff72e4d98be441499d56592 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
---
description: Pick up a task (Linear ticket, GitHub issue, todo.org task, or a described scope) and take it through Pre-work, Claim, Justify, Approach, Implement, Verify, and Hand-off. Three user-approval gates separate the phases. Pre-work covers eligibility, a fetch-and-reconcile against the base branch, and a source-code check that the problem still exists in the tree. The Justify gate weighs benefits, costs, impact, urgency, effort, alternatives, and ticket quality. The Approach gate covers root cause, risk, refactor prerequisites, test strategy (unit, integration, e2e, pairwise, characterization), migration and backwards-compat, feature flags, commit decomposition, and branch name. Implementation uses TDD (red, green, edge cases); a refactor audit then walks every touched file against a language-agnostic checklist, fixing each finding here or filing it as a ticket, never dropping one. A verify phase exercises the feature end-to-end locally (Playwright against localhost for web, scripted manual test otherwise) before the final gate hands off to the Review-and-Publish flow in commits.md. Use when starting work on a specific task where both "should we" and "how exactly" are worth deliberating. Do NOT use for open-ended bug investigation without a clear target (use debug first), for architectural paradigm exploration (use arch-design), for architectural decision recording (use arch-decide), when the task is trivial and obvious (just do it), or when requirements are still being shaped (use brainstorm).
---

# /start-work: pick up a task, justify it, plan it, build it

Three review gates separate the phases. The user can redirect or kill the work at each one.

0. **Pre-work.** Eligibility check, fetch-and-reconcile against the base branch, source-code check that the problem still exists.
1. **Claim.** Mark in-progress, assign, label, verify project.
2. **Justify (gate 1).** Benefits, costs, impact, urgency, effort, alternatives, ticket quality. Stop for approval.
3. **Approach (gate 2).** Root cause, risk, tests, migration, flag, commit decomposition. Stop for approval.
4. **Implement.** TDD red, green, edge cases, refactor audit of every touched file.
5. **Verify.** End-to-end or scripted manual test in the local environment.
6. **Ready to commit (gate 3).** Report, stop for approval.
7. **Hand off** to the Review-and-Publish flow in `commits.md`.

## Tool availability

The full flow assumes Linear MCP, the GitHub CLI (`gh`), the `/voice` skill, and the Playwright skills are present. None is required. When an assumed tool is missing, degrade gracefully — do not block the task on it.

- **Linear MCP unavailable.** Treat the task as unticketed for tracking purposes (Phase 1 "Unticketed", Phase 7 status moves skipped). Note in the session and the final report that Linear state was not updated, so the user can sync it by hand.
- **`gh` unavailable.** Skip the GitHub claim/assign and the `gh pr create` step. Commit locally and report that the PR must be opened manually, with the suggested title and body from the Phase 7 draft.
- **`/voice` unavailable.** Draft the commit and PR text directly and flag that the voice pass did not run, so the user reviews the prose more closely at the gate.
- **Playwright skills unavailable.** Fall back to the scripted-manual verification mode in Phase 5 even for browser UI, and note the missing automation.

In every case: do the work the available tools allow, and surface what could not be done rather than failing the phase.

## Ceremony scale

Match the ceremony to the size of the work. The full path — ticket, branch, three approval gates, commit-per-phase — fits a real change with review surface. A trivial local fix does not need it unless the user asks.

- **Trivial fix** (a typo, a one-line correction, a config tweak with no behavior surface): skip the ticket, the branch, and the gates. Make the change, verify it, report. Apply the full flow only if the user asks for it.
- **Small change** (one file, contained, low risk): keep the gates but collapse them — a single combined justify-and-approach note is enough, and one commit covers it.
- **Standard change** (the default): the full flow as written below.

When in doubt, ask the user which scale fits rather than defaulting to maximum ceremony on a two-line fix. The gates protect against wasted work and scope creep, not against small obvious edits.

## Usage

```
/start-work <task-ref>
```

`<task-ref>` can be:

- A Linear ticket ID or URL: `SE-170`, `https://linear.app/deepsat/issue/SE-170`
- A GitHub issue URL or number
- A todo.org heading reference or description: `todo.org:mission-sync refactor`
- A free-form scope description: "update the mission-card fallback"

If the reference is ambiguous, ask the user to clarify before proceeding.

## Phase 0: pre-work

Three checks before claiming the task. All run before any state change — no assignee added, no label written, no status moved. If any of them disqualify the task, the rollback is free.

### 0.1 Eligibility

Skip with a short note and stop if any apply:

- Task is already Done, closed, or merged.
- Task is assigned to someone else and the user has not asked to take it over.
- Task is an obvious duplicate of something in-progress.
- Task description is so vague that even the Justify gate cannot engage. Route to `/brainstorm`.

### 0.2 Pre-flight reconcile

The branch this task will be cut from must reflect the remote — otherwise the work happens on a stale base and Phase 4 starts from a phantom HEAD. Same shape as `commits.md` Step 0.

1. `git fetch --all --prune`.
2. Identify the base branch the working branch will be cut from. Usually `main`, but `develop` or `release/*` for projects that use them. Ask the user if it isn't obvious from `git log` or project docs.
3. Check the base against its upstream:

       git rev-list --left-right --count @{u}...<base>

   Decide based on the pair:

   - **0 behind** — no-op. Continue.
   - **Behind only, clean tree, on the base branch** — fast-forward: `git merge --ff-only @{u}`.
   - **Behind only, non-checkout base** — `git fetch . origin/<base>:<base>` advances the ref without touching the working tree.
   - **Behind only, dirty tree on the base** — surface to the user. Don't auto-stash or auto-merge. Offer to commit, stash, or skip the reconcile and proceed knowing the new branch will be cut from a stale base.
   - **Diverged (behind AND ahead)** — surface to the user. Ask whether to rebase, merge, or skip. Don't auto-resolve.

4. If the current branch is *not* the base branch (e.g. left over from a prior task), surface and ask whether to switch before continuing. Don't auto-switch — the user may want to finish or stash WIP first.

### 0.3 Existence check (validate the problem is real)

The ticket may describe a problem the code no longer has — fixed independently of the ticket, made obsolete by another change, or never present in the first place. Read the source to confirm the problem exists in the tree as the ticket describes, before justifying or planning the fix.

The check is on the **source code**, not on commit messages. A `git log --grep` catches the obvious "someone shipped this last week" case but misses what matters: a behavior no longer reachable, a guard added in a sibling commit, a refactor that incidentally removed the surface the bug lived on, a feature already implemented under a different name. The code is the truth; the log is a hint.

For **bugs**: read the code path the ticket describes. Confirm the buggy behavior is present in the source as written. If the repro steps in the ticket are concrete enough to run — a CLI command, a UI sequence, a unit test against the path — run them against the freshly-reconciled base. If both the code-read and the repro say the bug doesn't fire, surface to the user with three options:

- (a) Close as already-fixed (or never-present). Note in the close-out what the code actually does so a future reader sees why the ticket got dropped.
- (b) Repro steps in the ticket are stale or incomplete. Ping the author or dig for a real repro before proceeding.
- (c) The bug fires under conditions not yet reproduced. Proceed to Phase 1 with reduced confidence; flag this in the Justify gate so the user can redirect.

For **features**: read the source at the surface the feature would touch — file paths the ticket names, function or route names, config keys. Search for behavior, not just for names — a feature implemented under a different name still counts. If the code already does what the ticket asks for:

- (a) Already implemented. Close as already-done.
- (b) Partially implemented. Re-scope the ticket to what's left and proceed.
- (c) Implemented in a shape the ticket explicitly asks to change. Proceed with the ticket's design.

For **tests and chores**: usually skip — the work is the work. The exception is a test-for-existing-behavior ticket where the test may have been added since filing; one quick check of the test directory is enough.

If the existence check finds nothing conclusive, that's the expected case. Proceed to Phase 1.

## Phase 1: claim

Make ownership explicit. *When* the claim happens depends on where the task lives, because the cost of a wrong claim differs.

- **Personal todo tasks** carry no signal to anyone else, so there is nothing to gain by claiming before the work is justified. **Defer the todo.org claim to after the Justify gate (Phase 2).** Nothing changes in the file until the user has approved the task. If the gate kills the task, there is no rollback to do.
- **Team trackers** (Linear, GitHub) signal intent to teammates — "someone is on this" prevents duplicate work. Here claiming first earns its keep, so the claim happens now. But a claim that the Justify gate later reverses must roll back cleanly, so **record the prior state before changing it** (see each subsection), and the Phase 2 rollback restores exactly that state.

The exact steps depend on where the task lives.

### Linear ticket

1. Fetch the ticket with the Linear MCP tools. **Record the prior state** — current status, assignee(s), and label — so the Phase 2 rollback can restore exactly it.
2. Move status to **In Progress**.
3. Assign the user. If another assignee is already present, add the user as a second assignee. If the Linear API does not accept multiple assignees, post a comment ("Picking this up alongside <existing assignee>") and proceed.
4. Verify the ticket has exactly one of these labels: **Bug**, **Test**, **Chore**, or **Feature**. If missing or wrong, ask the user which applies and set it.
5. Verify the ticket's project. If unset or wrong, ask the user which project it belongs to.

### GitHub issue

1. Fetch with `gh issue view <n> --json title,body,state,assignees,labels`. **Record the prior assignee(s) and labels** from that output for the Phase 2 rollback.
2. Assign to the user: `gh issue edit <n> --add-assignee @me`.
3. Verify the Bug / Test / Chore / Feature label. Add if missing.
4. Post a comment noting you are starting work.

### Todo.org task

Deferred to after the Justify gate — no change to the file in Phase 1. Once Phase 2 is approved:

1. Locate the heading the user referenced.
2. Change the TODO keyword to `DOING`.
3. Add exactly one tag: `:bug:`, `:feature:`, `:test:`, or `:chore:`. Ask the user which applies if none is obvious. Todo.org is personal, so there is no assignee step.

Because nothing was claimed before the gate, a killed task needs no todo.org rollback.

### Unticketed

1. Note in the session that the work is unticketed.
2. Ask the user whether to create a ticket or issue retroactively before continuing. If no, proceed but flag in the final commit message that there is no linked ticket.

## Phase 2: justify (gate 1)

Read the task description end to end. Skim the code it references.

Then produce a justification that covers all of these, concisely:

1. **Benefits.** What is better after this lands? Concrete, not abstract.
2. **Costs.** Time, risk, reviewer bandwidth, ceremony overhead.
3. **Engineer impact.** Does it make someone's life easier? Catch a class of bug? Remove friction?
4. **End-user impact.** Behavioral change? Visible? Invisible-but-protective?
5. **Downsides.** What do we lose? Where would we regret doing this?
6. **Urgency and priority fit.** Does this align with current goals or an upcoming deadline? If the project has committed deadlines, explicitly check this against them. Anything not obviously on the critical path should be called out as "deferrable."
7. **Effort estimate.** S (under 1 hour), M (1 hour to 1 day), L (over 1 day). Rough is fine.
8. **Alternatives considered.** Is there a cheaper way? Can we defer? Can we address the root cause via a different path?
9. **Reasons not to do this.** A forced devil's-advocate verdict on whether the work should happen at all — distinct from Downsides (what the change costs) and Alternatives (cheaper paths). Surface the top three objections if real ones exist; when none rise to a genuine objection, say so in one line rather than manufacturing three (e.g. "Nothing material argues against this. No reason to defer or drop it."). Building the case against the work is cheapest at this gate, which is its purpose.
10. **Ticket quality check.** Is scope clear, are acceptance criteria concrete, are reproduction steps present for bugs? If **not clear**, stop and ask the user to choose one of:
   - (a) Bounce to `/brainstorm` to refine the ticket first.
   - (b) Ping the ticket author for clarification.
   - (c) Supply the missing info themselves right now, if it is easy for them to do so.

### Gate

Present the justification to the user. Stop. Wait for questions and explicit approval ("approved", "proceed", or equivalent) before starting Phase 3.

Do not generate the approach while waiting. The user may kill the task at this gate, and any pre-generated approach would be wasted work.

**On approval**, do the deferred personal claim now: for a todo.org task, run the Phase 1 "Todo.org task" steps (keyword to `DOING`, add the one tag). Team-tracker claims already happened in Phase 1, so there is nothing more to do for them here.

**If the user kills the task**, roll back only the team-tracker claim made in Phase 1, restoring the prior state recorded there:

- **Linear:** move the status back to the recorded prior status, remove the assignment you added (keep any pre-existing assignee), remove the label you added (only if it was absent before), and delete the "picking this up" comment if you posted one.
- **GitHub:** `gh issue edit <n> --remove-assignee @me`, remove any label you added that was absent before, and delete the start-work comment.
- **Todo.org:** nothing to undo — the claim was deferred and never ran.

## Phase 3: approach (gate 2)

Read the referenced code end to end. Understand the surrounding context: callers, callees, existing tests, adjacent modules.

Then produce an approach that covers:

1. **Root cause.** For bugs, where the bug originates, not just where it surfaces. For features, which layer owns the new behavior.
2. **Code that changes.** Files and functions, with a rough line-count estimate.
3. **Risk.** Who and what does this affect? Local (one file) or does it ripple? Flag anything that touches shared state, public APIs, or core data flow.
4. **Refactor prerequisites.** Does the codebase need restructuring before this fix is easy? If yes, that is a separate ticket and should be done first.
5. **Spec prerequisite.** Does this work need a design spec it does not already have? A spec is warranted when the work is large, has wide surface area, carries unresolved design questions, or commits to an interface others will build on. If one is needed and absent, that is a `/brainstorm` or spec step first. Flag it rather than coding ahead. For a big task this is never a silent skip: the approach summary must state explicitly why no spec is needed (e.g. scope is contained, the design is forced by the existing structure, nothing else depends on the interface), so the call is visible and challengeable at the gate. A small, contained task can pass without comment.
6. **Characterization tests.** If modifying existing untested code, write characterization tests first to lock behavior before changing it (see `testing.md`).
7. **Test strategy decomposition.** Which of these are needed, and roughly how many of each:
   - Unit tests.
   - Integration tests.
   - E2E tests.
   - Pairwise or combinatorial tests, if parameter-heavy (see `/pairwise-tests`).
8. **Migration and backwards-compat surface.** DB migration? API contract change? Frontend consumer impact? Config shape change? Flag if yes and describe the scope.
9. **Feature flag.** Does this ship behind a flag or direct? Always worth asking once.
10. **Commit decomposition.** One commit, or N commits? Each commit should be one logical change per `commits.md`. Default to bundling tests + the feature they cover into a single `feat(scope): X with tests` (or `fix(scope): X with tests`) commit — the test and the code under test belong together for review. Split into separate `test:` + `feat:` commits only when the test work is its own substantial review surface (characterization tests, fixture infrastructure, a new test harness) or when the failing test serves as a deliberate bug-report artifact in a `fix:` narrative. Size the Review-and-Publish ceremony ahead of time.
11. **Branch name.** Following the project convention: `fix/<ID>-slug`, `feature/<ID>-slug`, `chore/<ID>-slug`, or `test/<ID>-slug`. Unticketed work uses a short descriptive slug.

### Gate

Present the approach to the user. Stop. Wait for questions and explicit approval before starting Phase 4.

If the user redirects the approach, update the plan and re-present rather than silently adjusting during implementation.

## Phase 4: implement (TDD)

Follow the red-green-refactor cycle from `testing.md`.

1. **Create the branch** using the name decided in Phase 3.
2. **Red.** Write a failing test that demonstrates the bug or captures the new desired behavior. Run it. Confirm it fails for the right reason, not because the test itself is broken.
3. **Green.** Write the minimal code to make the test pass. Do not generalize yet. Do not add features the test does not require. Commit as `feat(scope): <desc> with tests` (or `fix(scope): <desc> with tests`) — bundle test + implementation per the commit-decomposition decision in Phase 3 step 9. Split into separate `test:` + `feat:` commits only when that decision called for it.
4. **Edge cases.** Add tests in all three categories per `testing.md`:
   - Normal: happy path, typical input.
   - Boundary: empty inputs, nulls, minimum and maximum values, single-element collections, Unicode, long strings, time and timezone boundaries, concurrent access.
   - Error: invalid inputs, missing required parameters, permission denied, resource exhaustion, malformed data, network failures.
   Commit as `test: add edge cases for <desc>`.
5. **Refactor audit.** After tests are green, audit every file you touched in this task — not just the code you wrote, but the whole file, top to bottom. The question is no longer "is my new code clean?" but "what refactoring opportunities exist in this file, and which belong on this branch versus a follow-up ticket?"

   Work the touched-file list explicitly. For each file, walk the checklist below (a through h), note every candidate, and decide its disposition. "Touching" a file means any modification, however small — a one-line edit still qualifies the whole file for audit. Keep tests green throughout. If they go red during a change, you have altered behavior, not just form — stop and decide whether the change is intentional before proceeding.

   The checklist is language-agnostic. The same smells appear in Python, TypeScript, Go, Elisp, Rust, shell, SQL, and anything else.

   a. **Stale documentation.** Comments, docstrings, file headers, module-level summaries, READMEs, ADRs, architectural diagrams, or any prose that now contradicts the code. Update or delete. Prefer deletion when the documentation duplicates what the code, the tooling, or the runtime config already communicates — duplicated information is rotted documentation waiting to happen. The test: if a future reader would learn nothing from the doc that the code does not already say, drop it.

   b. **Duplication.** Three distinct kinds:
      - *Logic duplication*: the same computation or control flow appearing in multiple places. Extract when it appears three or more times, or when the duplication crosses an abstraction boundary, or when a future divergence between the copies would be a real bug. Two occurrences of a simple expression usually does not justify extraction. Three similar lines beats a premature abstraction.
      - *Literal duplication*: repeated strings, regexes, magic numbers, paths, URLs, error codes, keywords — any value that would need to change together. A shared constant is cheap insurance and makes the intent explicit.
      - *Intra-function expression duplication*: the same non-trivial expression evaluated twice inside one scope. Bind it to a local name once. Shorter function and no risk of the two expressions drifting apart when someone edits one.

   c. **Naming drift.** Names that describe what the identifier used to do, not what it does now. Names that mix abstraction levels (a high-level operation named after its implementation detail). Inconsistency across the module: `get_foo` next to `fetch_bar` next to `load_baz` for operations that are semantically the same. Pick one verb per concept and rename. Renaming is cheap in any language with a competent tool, and clarity compounds.

   d. **Scope and cohesion.** Functions doing two things — a name with "and" or two clauses joined by commas is the tell. Split. Related functions scattered across the file. Cluster them with a comment header or section break. Unrelated functions grouped only by a superficial property (all private, all on the same keybinding, all using the same framework feature). Group by purpose, not accident. Code reads like a book. Related concepts should be neighbors.

   e. **Premature abstraction.** Helpers with one caller that do not document intent better than the inline version — inline them. Parameters always passed the same value by every caller — drop them. Configuration knobs that no caller varies — delete them. Interfaces with a single implementation and no realistic second — collapse them. Abstractions built "for future flexibility" that have not been exercised are carrying cost with no benefit. Speculative generality is a tax you pay on every read.

   f. **Dead code.** Unused imports, uncalled functions, variables never referenced, parameters never consumed inside the body, types no one uses. Commented-out blocks kept "in case we need it later." You will not need them, and if you do, the version control history has them. Delete.

   g. **Error handling parity.** Similar operations emitting different error shapes (exceptions vs. return values vs. log-and-continue vs. silent swallow). Error messages that expose internal state unhelpfully, or that strip the context a caller needs to act. Guards present in some parallel paths but missing in others. Parity beats novelty — if three siblings behave the same way, the fourth should too, or have a documented reason not to.

   h. **Test smells.** Tests are code and rot the same way. Copy-pasted fixtures that should parametrize. Assertions that lock to implementation (exact strings, internal structure, field order) rather than behavior. Dead mocks that stub something the test no longer exercises. Mocks of internal helpers rather than external boundaries. See `testing.md` "Signs of overmocking."

   i. **Out-of-file scope.** The audit stops at the touched-file boundary. If you happen to notice a smell in a file you did not touch, do not expand the audit into it — file a ticket so the finding is not lost. Drive-by audits across the codebase balloon review time and break the working set. Exception: a rename or structural change that would leave the codebase inconsistent if shipped half-done is in scope and required.

   **Disposition for each candidate.** Every candidate must land in one of three buckets. There is no silent drop.

   - *Fix now, fold into the related feature or fix commit*: small, directly related to the task's work on this file, obviously clearer, no new risk surface.
   - *Fix now, separate `refactor:` commit on this branch*: related to the surface you touched but larger in scope, or reshaping something non-trivial. Separating it keeps the feature commit focused for review.
   - *File a ticket or todo.org entry*: the smell is real but unrelated to this task, lives in a touched file outside the task's working set, or was noticed in an untouched file. Filing — not skipping — is the default for anything that does not fit the two "fix now" cases. Capture enough detail that a future session can act on it: file path, line or function, smell category (one of a through h), and a one-line description.

   **Where to file in todo.org.** Placement matters because follow-ups nested under a parent task get orphaned when the parent closes.

   - *Epic-style parent task* (level-2 `** TODO` with multiple level-3 `*** TODO` children): file the follow-up as a level-2 *sibling* of the parent, immediately after the parent's last child block. Siblings stay visible after parent closure and don't get archived under one specific child.
   - *Standalone task* (level-2 with no children, or a level-3 inside another structure): file as a new level-2 entry in the same `* Open Work` section. Don't nest under the originating task.
   - Both cases: include a "Triggered by: YYYY-MM-DD <task or commit>" line in the new task body so a future reader sees what surfaced it.
   - Priority on the new task is its own decision, not inherited from the originating task.

   If a candidate feels too small to fix and too small to file, it was either not a real smell, or you are talking yourself out of a two-line todo entry. Write the entry.

   **Stop conditions.** The *audit* is complete when every touched file has been walked and every candidate has a disposition. The *fixing* stops earlier: ask "would a reasonable reviewer flag this?" of the remaining in-scope candidates. If the answer is no, stop fixing and file the rest. Shipping beats polishing, but filing beats forgetting.

   Commit: group meaningful refactors into a `refactor: <desc>` commit when they stand on their own. Fold small tweaks into the associated feature or fix commit when they are tied to the same scope. The commit history should let a future reader see intent per commit, not a mixture of "did the thing" and "also cleaned up five unrelated corners."

### Constraints

- **Root cause, not symptom.** If the task is a bug, fix where the bug originates, not where it surfaces.
- **No drive-by refactoring.** Only change code the task requires. Unrelated cleanups go in a separate ticket.
- **No hypothetical-future code.** Solve the current problem. Do not design for requirements that have not been asked.
- **Framework and library code is trusted.** Mock at boundaries (network, time, file I/O), not at internal helpers (see `testing.md` "Signs of overmocking").

## Phase 5: verify end-to-end

Unit tests prove the internals are green. They cannot prove the feature works for the user. Before the ready-to-commit gate, exercise the feature end-to-end on your local machine — a running dev server on localhost for web work, the actual editor or CLI for everything else. Never production. Production verification is a separate concern that belongs to release procedures, not to a pre-merge workflow. Skipping this phase is how "all tests green" becomes "shipped broken" — it caught a one-second browser-open timeout in local testing that no unit test had any way to see.

Pick the verification mode that matches the project's stack.

### If the project has browser-automatable UI

Web apps, dashboards, SPAs, admin tools, any feature reachable through a browser. Write a Playwright end-to-end test that exercises:

- The happy path the feature was built for, clicking through as a user would.
- Any boundary or error cases that unit tests could not reach: authentication, cross-page navigation, state across reloads, deep-link URLs, permission-denied flows.
- The user-observable failure mode of any known upstream dependency, mocked or stubbed where needed.

The E2E test lives in the repo alongside the feature and runs in CI like any other test. Delegate the test authoring to `/playwright-js` for JavaScript or TypeScript stacks, `/playwright-py` for Python stacks. Do not write Playwright code from scratch when those skills are available.

### If the project has no browser UI

CLI tools, libraries, Emacs or editor configuration, shell scripts, daemons, anything where there is no DOM to automate. Lead the user through a scripted manual test. Provide:

1. **An explicit sequence of steps.** Specific commands to run, specific keys to press, specific files to open. Not "try the feature" but "open file X, press C-; h d, pick draft Y."
2. **The expected observable outcome at each step.** What message should appear in the echo area, what buffer should show, what file should change on disk, what exit code the process should return, what the browser should display. One expected outcome per step so failures pinpoint where.
3. **Failure signals.** What broken looks like. "If you see nothing in the echo area, the binding did not fire. If you see `No #+hugo_draft keyword`, the buffer has no Hugo front matter." Pattern-matching against known failure modes shortens diagnosis.

Wait for the user to walk through the steps and report back. Do not skip ahead. Do not assume success without the user's confirmation. If the user reports a failure, route the failure back through Phase 3 (if the approach was wrong) or Phase 4 (if the implementation was wrong), then re-verify.

### In both modes

- **Run against a clean environment.** Restart the process, clear the cache, open a fresh browser session, re-evaluate the loaded module. Stale state masks real bugs — today's "toggling the draft doesn't work" turned out to be stale code in a running Emacs.
- **Verify failure paths, not just the happy path.** A feature that works when nothing goes wrong is half-tested. Force an error path if the feature has one.
- **If verification reveals a unit-test gap, add the missing unit test before gate 3.** A bug you hit manually is a bug worth locking in with a test so it cannot regress.
- **Keep the verification artifact.** For browser work, the Playwright test stays in the repo. For manual scripts, paste the steps into the Phase 6 handoff report so a reviewer can re-verify on request.

### Stop condition

Every verified scenario produces its expected observable outcome. Any failure is routed back to Phase 3 or Phase 4 — not papered over, not marked as "known issue" without filing a follow-up ticket.

## Phase 6: ready to commit (gate 3)

Before handing off to the Review-and-Publish flow, stop and report:

- What was done. Files changed, tests added, test-suite result.
- What was verified in Phase 5, and how. For manual scripts, paste the step list so a reviewer can re-run the verification. For Playwright tests, name the test file.
- Any deviations from the Phase 3 approach that happened during implementation, and why.
- Follow-up tickets filed during the refactor audit, listed by ID so the reviewer can see what was deferred and why. "Surfaced" is not enough — these are actually filed before gate 3 clears, not left as a mental note.

Wait for explicit approval before starting the commit and PR ceremony.

If deviations are significant, the user may want to loop back and revise the approach before publishing.

## Phase 7: hand off to Review-and-Publish

Follow `commits.md` exactly. Summary of the flow:

1. Run `/review-code --staged` before each commit, or `/review-code` on the whole branch before the PR. Block on Critical or Important findings.
2. Draft the commit message to `/tmp/commit-<slug>.md`. Run `/voice personal`. Stop for approval.
3. After approval, commit.
4. Draft the PR body to `/tmp/pr-<ticket-or-slug>.md`. Body must include a `Linear:` or equivalent cross-link line. Run `/voice personal`. Stop for approval.
5. After approval, push and run `gh pr create`.
6. Post the PR URL back to the Linear ticket, GitHub issue, or todo.org entry.
7. Move the Linear or GitHub status to **Dev Review**. Todo.org has no equivalent. Leave the todo.org entry as `DOING` until the PR merges.

## Anti-patterns

- **Skipping the pre-flight reconcile.** Cutting a new branch from a stale base means the whole task happens on top of yesterday's main. Conflicts surface at PR time instead of at the start; rebases later are noisier than a fetch up front.
- **Taking the ticket's word that the problem still exists.** Tickets age. Read the source. A `git log --grep` for a fix commit is a hint, not a check — fixes ship under all kinds of commit-message wording, and the buggy behavior may be gone for reasons that never landed in a commit titled "fix." Five minutes of source-read at Phase 0.3 saves an entire Justify-and-Approach cycle on a phantom problem.
- **Skipping the Justify gate.** "This is obviously worth doing" is exactly what the gate exists to verify. If the answer really is obvious, the gate takes thirty seconds.
- **Skipping the Approach gate.** Implementation without a plan is how scope creep happens. It is also how the user loses the chance to redirect.
- **Marking a personal todo task DOING before Phase 2 approval.** Personal claims carry no teammate signal, so they wait until the gate clears — a killed task then needs no rollback. Team-tracker claims (Linear, GitHub) are the exception: they happen in Phase 1 to flag intent, but only after the prior state is recorded so the gate can restore it cleanly.
- **Blurring the gates.** Write the justification, stop, wait. Do not pre-generate the approach while waiting. The user may kill the task and the pre-work gets wasted.
- **Treating Feature tasks as skippable on the Approach gate.** Features especially need the migration, backwards-compat, and feature-flag questions answered up front.
- **Letting the TDD cycle drift.** If the test passes before the implementation is written, the test is wrong. Confirm the red before moving to green.
- **Skipping the refactor audit.** A green test suite is necessary, not sufficient. Walking the touched-file list against the refactor checklist catches the stale comment, the naming drift, and the duplicated expression that a reviewer will otherwise flag. Leave the code better than you found it, within scope — and file what you cannot fix on this branch.
- **Auditing only the code you wrote.** "I only changed one line, the rest of the file isn't my problem" — it is, to the extent that you file what you see. The audit is per touched file, not per diff hunk. Anything noticed in a touched file lands somewhere: this branch or a ticket.
- **Skipping the verify phase.** Green unit tests do not mean the feature works for the user. A one-second delay that looks fine on a mocked process is a broken experience on a real Hugo build. Five minutes of scripted manual testing or a Playwright run catches the gap before a reviewer does.

## Cross-references

- `commits.md`: the Review-and-Publish flow used in Phase 7.
- `testing.md`: TDD discipline, edge case categories, characterization tests, overmocking signals.
- `subagents.md`: dispatch contract for parallel code research during Phase 3 if the code surface is large.
- `/review-code`: runs inside Phase 7.
- `/brainstorm`: route here from the Phase 2 ticket-quality branch.
- `/arch-design`: route here if Phase 3 reveals an architectural question the task cannot answer on its own.
- `/arch-decide`: route here if Phase 3 surfaces a decision worth recording as an ADR.
- `/debug`: route here if Phase 2 reveals the task needs investigation before it can be justified.
- `/pairwise-tests`: route here from Phase 3 if the test matrix warrants combinatorial coverage.
- `/playwright-js`, `/playwright-py`: route here from Phase 5 to author E2E tests for web projects.