aboutsummaryrefslogtreecommitdiff
path: root/docs/design/2026-06-29-green-baseline-proposal.org
blob: 47de18db6c4c7ac17c83007f8aa2217663e8d1a8 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
#+TITLE: Proposal: ensure a green test run before starting work
#+AUTHOR: Craig Jennings (via .emacs.d session)
#+DATE: 2026-06-29

* Why

While running a multi-task refactor speedrun in =.emacs.d=, the very first full
=make test= surfaced a failing test (=test-system-cmd-restart-emacs-no-service-aborts=)
that turned out to be *pre-existing* -- it failed on clean HEAD, unrelated to
the work. It had nothing to do with the task; it just happened to fail on this
machine (a native-comp mock that bypasses =symbol-function= redefinition, real
check passing because the box has =emacs.service=).

Two costs landed because the red was already there when work began:

1. Every later "did I break this?" suite run carried a known failure, so the
   green bar became "only that one fails" instead of a clean pass I could read
   at a glance. Easy to let a *new* regression hide behind the familiar red.
2. The work assumed the tree was in a known-good state. It wasn't, and nothing
   in the workflow forced that assumption to be checked first.

The fix is cheap and general: run the suite *before* starting work, and clear
(or explicitly triage) any failure before the work begins. A green start
confirms we're in the known-good place we think we are, and any issue is fixed
before it can be confused with our own changes.

* Proposed change 1 -- claude-rules/verification.md

Add a section (suggested placement: right after =## The Rule=, before
=## What Fresh Means=). Proposed text, ready to paste:

#+begin_example
## Green Baseline Before Starting Work

Run the test suite before you start work, not only before you finish. A clean run at the start confirms the tree is in the known-good state you assume it is, so the baseline you build on and measure your changes against is actually green.

If the suite is red before you touch anything, fix or explicitly triage the failure first. A pre-existing failure left in place poisons every later "did I break this?" check: you can't separate your own regressions from the noise, and the end-of-work run stops being readable as pass/fail at a glance. Work that assumes a known-good base may also be built on a broken assumption you never saw.

When a pre-existing failure genuinely can't be fixed before the work begins (out of scope, or it needs a decision), record it as a tracked task with the diagnosis and carry its name forward. The green bar for the rest of the work is then explicitly "only this known failure remains," not a silent tolerance for red.

This is the start-of-work counterpart to the Before Committing gate below: one confirms the ground is solid before you build, the other confirms you didn't crack it.
#+end_example

* Proposed change 2 -- start-work skill, Pre-work phase

The start-work skill already has a Pre-work phase (eligibility, fetch-and-reconcile
against base, source-code check that the problem still exists). Add a green-baseline
step to that phase:

- Run the project's test suite before claiming the work.
- If it's fully green, proceed.
- If it's red, fix the failure first, or (when out of scope / needs a decision)
  file a tracked task with the diagnosis and carry its name forward as the only
  tolerated failure for this work.
- Surface the baseline result so "we started from green" is on the record.

This makes the verification.md principle operational at the exact moment it
matters -- the start of a task -- the same way the Verify phase and the
Review-and-Publish flow operationalize the end-of-work gates.

* Note on why this came as a proposal, not a direct edit

=.emacs.d='s =.claude/rules/*.md= are symlinks into =~/code/rulesets/claude-rules/=,
so editing =verification.md= from the downstream session would modify the
rulesets canonical directly. Per the cross-project rule, downstream sessions
send rulesets the proposed change rather than editing its canonical in place.
Hence this note instead of a local stopgap edit. The start-work skill isn't
installed on this machine to edit anyway.

The test that surfaced all this is already fixed in =.emacs.d= (commit on main:
mock =executable-find= at the boundary instead of the helper). The durable
process change is the part that belongs here.