docs: add a green-baseline-before-work gate to verification and start-work

A red test suite at the start of work poisons every later "did I break this?" check: you can't tell your own regressions from pre-existing noise, and the end-of-work green bar stops being readable. verification.md now asks for a clean suite run before work begins, not only before commit. start-work runs it as Pre-work step 0.3 against the reconciled base. I added two carve-outs the original proposal lacked: a project with no suite has nothing to baseline, and a suite that can't run is the existing "When You Cannot Verify" case rather than a blocker. The step lands after the reconcile, so the baseline reflects the base the work is cut from. The Phase 4 TDD red is called out as expected, distinct from a baseline failure.
author: Craig Jennings <c@cjennings.net> 2026-06-30 13:27:24 -0400
committer: Craig Jennings <c@cjennings.net> 2026-06-30 13:27:24 -0400
commit: d0ab04751fe437b6c9509a2ff3217cda0f624edc (patch)
tree: 501250177ea7174f38795d693acdac01a8d3351d
parent: a266250c1e24c9b2f6f246206c1b726dbb84c4bf (diff)
download: rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.tar.gz
rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.zip
3 files changed, 100 insertions, 5 deletions
diff --git a/.claude/commands/start-work.md b/.claude/commands/start-work.md
index d146622..85ed6b0 100644
--- a/.claude/commands/start-work.md
+++ b/.claude/commands/start-work.md
@@ -1,12 +1,12 @@
 ---
-description: Pick up a task (Linear ticket, GitHub issue, todo.org task, or a described scope) and take it through Pre-work, Claim, Justify, Approach, Implement, Verify, and Hand-off. Three user-approval gates separate the phases. Pre-work covers eligibility, a fetch-and-reconcile against the base branch, and a source-code check that the problem still exists in the tree. The Justify gate weighs benefits, costs, impact, urgency, effort, alternatives, and ticket quality. The Approach gate covers root cause, risk, refactor prerequisites, test strategy (unit, integration, e2e, pairwise, characterization), migration and backwards-compat, feature flags, commit decomposition, and branch name. Implementation uses TDD (red, green, edge cases); a refactor audit then walks every touched file against a language-agnostic checklist, fixing each finding here or filing it as a ticket, never dropping one. A verify phase exercises the feature end-to-end locally (Playwright against localhost for web, scripted manual test otherwise) before the final gate hands off to the Review-and-Publish flow in commits.md. Use when starting work on a specific task where both "should we" and "how exactly" are worth deliberating. Do NOT use for open-ended bug investigation without a clear target (use debug first), for architectural paradigm exploration (use arch-design), for architectural decision recording (use arch-decide), when the task is trivial and obvious (just do it), or when requirements are still being shaped (use brainstorm).
+description: Pick up a task (Linear ticket, GitHub issue, todo.org task, or a described scope) and take it through Pre-work, Claim, Justify, Approach, Implement, Verify, and Hand-off. Three user-approval gates separate the phases. Pre-work covers eligibility, a fetch-and-reconcile against the base branch, a green-baseline suite run, and a source-code check that the problem still exists in the tree. The Justify gate weighs benefits, costs, impact, urgency, effort, alternatives, and ticket quality. The Approach gate covers root cause, risk, refactor prerequisites, test strategy (unit, integration, e2e, pairwise, characterization), migration and backwards-compat, feature flags, commit decomposition, and branch name. Implementation uses TDD (red, green, edge cases); a refactor audit then walks every touched file against a language-agnostic checklist, fixing each finding here or filing it as a ticket, never dropping one. A verify phase exercises the feature end-to-end locally (Playwright against localhost for web, scripted manual test otherwise) before the final gate hands off to the Review-and-Publish flow in commits.md. Use when starting work on a specific task where both "should we" and "how exactly" are worth deliberating. Do NOT use for open-ended bug investigation without a clear target (use debug first), for architectural paradigm exploration (use arch-design), for architectural decision recording (use arch-decide), when the task is trivial and obvious (just do it), or when requirements are still being shaped (use brainstorm).
 ---
 
 # /start-work: pick up a task, justify it, plan it, build it
 
 Three review gates separate the phases. The user can redirect or kill the work at each one.
 
-0. **Pre-work.** Eligibility check, fetch-and-reconcile against the base branch, source-code check that the problem still exists.
+0. **Pre-work.** Eligibility check, fetch-and-reconcile against the base branch, green-baseline suite run, source-code check that the problem still exists.
 1. **Claim.** Mark in-progress, assign, label, verify project.
 2. **Justify (gate 1).** Benefits, costs, impact, urgency, effort, alternatives, ticket quality. Stop for approval.
 3. **Approach (gate 2).** Root cause, risk, tests, migration, flag, commit decomposition. Stop for approval.
@@ -53,7 +53,7 @@ If the reference is ambiguous, ask the user to clarify before proceeding.
 
 ## Phase 0: pre-work
 
-Three checks before claiming the task. All run before any state change — no assignee added, no label written, no status moved. If any of them disqualify the task, the rollback is free.
+Four checks before claiming the task. All run before any state change — no assignee added, no label written, no status moved. If any of them disqualify the task, the rollback is free.
 
 ### 0.1 Eligibility
 
@@ -84,7 +84,18 @@ The branch this task will be cut from must reflect the remote — otherwise the
 
 4. If the current branch is *not* the base branch (e.g. left over from a prior task), surface and ask whether to switch before continuing. Don't auto-switch — the user may want to finish or stash WIP first.
 
-### 0.3 Existence check (validate the problem is real)
+### 0.3 Green baseline (confirm the tree starts known-good)
+
+Run the project's test suite now, against the reconciled base, so the baseline you build on is actually green (see the Green Baseline section in `verification.md`). This runs after 0.2 — baselining a stale tree is pointless.
+
+- **Green** — proceed to 0.4.
+- **Red** — fix the failure first, or, when it's out of scope or needs a decision, file a tracked task with the diagnosis and carry its name forward as the only tolerated failure for this work. Surface the baseline result either way so "we started from green" is on the record.
+- **No suite** — nothing to baseline. Note it and proceed (your personal/doc projects hit this).
+- **Suite can't run** (no network, missing dep, sandbox limit) — that's the "When You Cannot Verify" case in `verification.md`, not a blocker. Record what you couldn't run, name the risk, and proceed.
+
+This baseline is the green starting point; the intentional red test you write in Phase 4 (TDD) is expected and distinct from a baseline failure.
+
+### 0.4 Existence check (validate the problem is real)
 
 The ticket may describe a problem the code no longer has — fixed independently of the ticket, made obsolete by another change, or never present in the first place. Read the source to confirm the problem exists in the tree as the ticket describes, before justifying or planning the fix.
 
@@ -337,7 +348,7 @@ Follow `commits.md` exactly. Summary of the flow:
 ## Anti-patterns
 
 - **Skipping the pre-flight reconcile.** Cutting a new branch from a stale base means the whole task happens on top of yesterday's main. Conflicts surface at PR time instead of at the start; rebases later are noisier than a fetch up front.
-- **Taking the ticket's word that the problem still exists.** Tickets age. Read the source. A `git log --grep` for a fix commit is a hint, not a check — fixes ship under all kinds of commit-message wording, and the buggy behavior may be gone for reasons that never landed in a commit titled "fix." Five minutes of source-read at Phase 0.3 saves an entire Justify-and-Approach cycle on a phantom problem.
+- **Taking the ticket's word that the problem still exists.** Tickets age. Read the source. A `git log --grep` for a fix commit is a hint, not a check — fixes ship under all kinds of commit-message wording, and the buggy behavior may be gone for reasons that never landed in a commit titled "fix." Five minutes of source-read at Phase 0.4 saves an entire Justify-and-Approach cycle on a phantom problem.
 - **Skipping the Justify gate.** "This is obviously worth doing" is exactly what the gate exists to verify. If the answer really is obvious, the gate takes thirty seconds.
 - **Skipping the Approach gate.** Implementation without a plan is how scope creep happens. It is also how the user loses the chance to redirect.
 - **Marking a personal todo task DOING before Phase 2 approval.** Personal claims carry no teammate signal, so they wait until the gate clears — a killed task then needs no rollback. Team-tracker claims (Linear, GitHub) are the exception: they happen in Phase 1 to flag intent, but only after the prior state is recorded so the gate can restore it cleanly.
diff --git a/claude-rules/verification.md b/claude-rules/verification.md
index 388e0b5..26d05b5 100644
--- a/claude-rules/verification.md
+++ b/claude-rules/verification.md
@@ -13,6 +13,18 @@ This applies to every completion claim:
 - "Bug is fixed" → Run the reproduction steps. Confirm the bug is gone.
 - "No regressions" → Run the full test suite, not just the tests you added.
 
+## Green Baseline Before Starting Work
+
+Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green.
+
+If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw.
+
+When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red.
+
+Two cases are not failures of this rule. A project with no suite has nothing to baseline — note that and proceed. A suite that can't run (no network, a missing dependency, a sandbox limit) is the "When You Cannot Verify" case below, not a work-blocker — record what you couldn't run and proceed with the risk named.
+
+This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it.
+
 ## What Fresh Means
 
 - Run the verification command **now**, in the current session
diff --git a/docs/design/2026-06-29-green-baseline-proposal.org b/docs/design/2026-06-29-green-baseline-proposal.org
new file mode 100644
index 0000000..47de18d
--- /dev/null
+++ b/docs/design/2026-06-29-green-baseline-proposal.org
@@ -0,0 +1,72 @@
+#+TITLE: Proposal: ensure a green test run before starting work
+#+AUTHOR: Craig Jennings (via .emacs.d session)
+#+DATE: 2026-06-29
+
+* Why
+
+While running a multi-task refactor speedrun in =.emacs.d=, the very first full
+=make test= surfaced a failing test (=test-system-cmd-restart-emacs-no-service-aborts=)
+that turned out to be *pre-existing* -- it failed on clean HEAD, unrelated to
+the work. It had nothing to do with the task; it just happened to fail on this
+machine (a native-comp mock that bypasses =symbol-function= redefinition, real
+check passing because the box has =emacs.service=).
+
+Two costs landed because the red was already there when work began:
+
+1. Every later "did I break this?" suite run carried a known failure, so the
+   green bar became "only that one fails" instead of a clean pass I could read
+   at a glance. Easy to let a *new* regression hide behind the familiar red.
+2. The work assumed the tree was in a known-good state. It wasn't, and nothing
+   in the workflow forced that assumption to be checked first.
+
+The fix is cheap and general: run the suite *before* starting work, and clear
+(or explicitly triage) any failure before the work begins. A green start
+confirms we're in the known-good place we think we are, and any issue is fixed
+before it can be confused with our own changes.
+
+* Proposed change 1 -- claude-rules/verification.md
+
+Add a section (suggested placement: right after =## The Rule=, before
+=## What Fresh Means=). Proposed text, ready to paste:
+
+#+begin_example
+## Green Baseline Before Starting Work
+
+Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green.
+
+If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw.
+
+When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red.
+
+This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it.
+#+end_example
+
+* Proposed change 2 -- start-work skill, Pre-work phase
+
+The start-work skill already has a Pre-work phase (eligibility, fetch-and-reconcile
+against base, source-code check that the problem still exists). Add a green-baseline
+step to that phase:
+
+- Run the project's test suite before claiming the work.
+- If it's fully green, proceed.
+- If it's red, fix the failure first, or (when out of scope / needs a decision)
+  file a tracked task with the diagnosis and carry its name forward as the only
+  tolerated failure for this work.
+- Surface the baseline result so "we started from green" is on the record.
+
+This makes the verification.md principle operational at the exact moment it
+matters -- the start of a task -- the same way the Verify phase and the
+Review-and-Publish flow operationalize the end-of-work gates.
+
+* Note on why this came as a proposal, not a direct edit
+
+=.emacs.d='s =.claude/rules/*.md= are symlinks into =~/code/rulesets/claude-rules/=,
+so editing =verification.md= from the downstream session would modify the
+rulesets canonical directly. Per the cross-project rule, downstream sessions
+send rulesets the proposed change rather than editing its canonical in place.
+Hence this note instead of a local stopgap edit. The start-work skill isn't
+installed on this machine to edit anyway.
+
+The test that surfaced all this is already fixed in =.emacs.d= (commit on main:
+mock =executable-find= at the boundary instead of the helper). The durable
+process change is the part that belongs here.
author	Craig Jennings <c@cjennings.net>	2026-06-30 13:27:24 -0400
committer	Craig Jennings <c@cjennings.net>	2026-06-30 13:27:24 -0400
commit	d0ab04751fe437b6c9509a2ff3217cda0f624edc (patch)
tree	501250177ea7174f38795d693acdac01a8d3351d
parent	a266250c1e24c9b2f6f246206c1b726dbb84c4bf (diff)
download	rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.tar.gz rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.zip