diff options
| author | Craig Jennings <c@cjennings.net> | 2026-06-30 13:27:24 -0400 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-06-30 13:27:24 -0400 |
| commit | d0ab04751fe437b6c9509a2ff3217cda0f624edc (patch) | |
| tree | 501250177ea7174f38795d693acdac01a8d3351d /docs/design | |
| parent | a266250c1e24c9b2f6f246206c1b726dbb84c4bf (diff) | |
| download | rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.tar.gz rulesets-d0ab04751fe437b6c9509a2ff3217cda0f624edc.zip | |
docs: add a green-baseline-before-work gate to verification and start-work
A red test suite at the start of work poisons every later "did I break this?" check: you can't tell your own regressions from pre-existing noise, and the end-of-work green bar stops being readable. verification.md now asks for a clean suite run before work begins, not only before commit. start-work runs it as Pre-work step 0.3 against the reconciled base.
I added two carve-outs the original proposal lacked: a project with no suite has nothing to baseline, and a suite that can't run is the existing "When You Cannot Verify" case rather than a blocker. The step lands after the reconcile, so the baseline reflects the base the work is cut from. The Phase 4 TDD red is called out as expected, distinct from a baseline failure.
Diffstat (limited to 'docs/design')
| -rw-r--r-- | docs/design/2026-06-29-green-baseline-proposal.org | 72 |
1 files changed, 72 insertions, 0 deletions
diff --git a/docs/design/2026-06-29-green-baseline-proposal.org b/docs/design/2026-06-29-green-baseline-proposal.org new file mode 100644 index 0000000..47de18d --- /dev/null +++ b/docs/design/2026-06-29-green-baseline-proposal.org @@ -0,0 +1,72 @@ +#+TITLE: Proposal: ensure a green test run before starting work +#+AUTHOR: Craig Jennings (via .emacs.d session) +#+DATE: 2026-06-29 + +* Why + +While running a multi-task refactor speedrun in =.emacs.d=, the very first full +=make test= surfaced a failing test (=test-system-cmd-restart-emacs-no-service-aborts=) +that turned out to be *pre-existing* -- it failed on clean HEAD, unrelated to +the work. It had nothing to do with the task; it just happened to fail on this +machine (a native-comp mock that bypasses =symbol-function= redefinition, real +check passing because the box has =emacs.service=). + +Two costs landed because the red was already there when work began: + +1. Every later "did I break this?" suite run carried a known failure, so the + green bar became "only that one fails" instead of a clean pass I could read + at a glance. Easy to let a *new* regression hide behind the familiar red. +2. The work assumed the tree was in a known-good state. It wasn't, and nothing + in the workflow forced that assumption to be checked first. + +The fix is cheap and general: run the suite *before* starting work, and clear +(or explicitly triage) any failure before the work begins. A green start +confirms we're in the known-good place we think we are, and any issue is fixed +before it can be confused with our own changes. + +* Proposed change 1 -- claude-rules/verification.md + +Add a section (suggested placement: right after =## The Rule=, before +=## What Fresh Means=). Proposed text, ready to paste: + +#+begin_example +## Green Baseline Before Starting Work + +Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green. + +If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw. + +When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red. + +This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it. +#+end_example + +* Proposed change 2 -- start-work skill, Pre-work phase + +The start-work skill already has a Pre-work phase (eligibility, fetch-and-reconcile +against base, source-code check that the problem still exists). Add a green-baseline +step to that phase: + +- Run the project's test suite before claiming the work. +- If it's fully green, proceed. +- If it's red, fix the failure first, or (when out of scope / needs a decision) + file a tracked task with the diagnosis and carry its name forward as the only + tolerated failure for this work. +- Surface the baseline result so "we started from green" is on the record. + +This makes the verification.md principle operational at the exact moment it +matters -- the start of a task -- the same way the Verify phase and the +Review-and-Publish flow operationalize the end-of-work gates. + +* Note on why this came as a proposal, not a direct edit + +=.emacs.d='s =.claude/rules/*.md= are symlinks into =~/code/rulesets/claude-rules/=, +so editing =verification.md= from the downstream session would modify the +rulesets canonical directly. Per the cross-project rule, downstream sessions +send rulesets the proposed change rather than editing its canonical in place. +Hence this note instead of a local stopgap edit. The start-work skill isn't +installed on this machine to edit anyway. + +The test that surfaced all this is already fixed in =.emacs.d= (commit on main: +mock =executable-find= at the boundary instead of the helper). The durable +process change is the part that belongs here. |
