aboutsummaryrefslogtreecommitdiff
path: root/five-whys
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-05-22 14:26:43 -0500
committerCraig Jennings <c@cjennings.net>2026-05-22 14:26:43 -0500
commit3916dc446c8925f64a974498d326637b34d46575 (patch)
treec6f685d27707511a6f0bd8fc18279e70371d45f7 /five-whys
parent282a35d30edca8c5ae7a5955111440908413f267 (diff)
downloadrulesets-3916dc446c8925f64a974498d326637b34d46575.tar.gz
rulesets-3916dc446c8925f64a974498d326637b34d46575.zip
docs(skills): tighten debug, root-cause-trace, and five-whys
Three audit-pass fixes across the debugging skills. debug now captures environment and recent-change context (versions, flags, dataset, seed/clock, concurrency, recent commits) as a Phase-1 step. Many intermittent bugs live in state or environment, not a local code path, and "what changed recently" is often the fastest route to the cause. root-cause-trace's defense-in-depth said to add a check at every layer that could have caught the bad value, which breeds validation spam. It now adds checks only at boundary-owning layers (ingress, persistence, the invariant owner, final render), and says a pass-through function that owns neither a boundary nor an invariant shouldn't get a duplicate null check. five-whys now makes each link carry an evidence field and a counterfactual: if you remove this cause, does the symptom above still happen? That's the guard against a tidy chain that reads well but wouldn't have prevented the failure.
Diffstat (limited to 'five-whys')
-rw-r--r--five-whys/SKILL.md9
1 files changed, 8 insertions, 1 deletions
diff --git a/five-whys/SKILL.md b/five-whys/SKILL.md
index 7d7080d..e645f82 100644
--- a/five-whys/SKILL.md
+++ b/five-whys/SKILL.md
@@ -37,13 +37,20 @@ Good: "The 2026-04-17 release was rolled back at 14:02 after the cart-checkout e
Bad: "Our releases are flaky."
-### 2. Ask Why — One Answer
+### 2. Ask Why — One Answer, With Evidence and a Counterfactual
Not three possible answers. One best-supported answer, based on evidence you can point to. If the question genuinely has multiple independent causes, you'll branch in step 4.
+Each link in the chain owes two things, not just an answer:
+
+- **Evidence** — what you can point to that supports this cause (a log line, a commit, a metric, a config value). "It seems like" without evidence is a guess, and a guessed link derails every why below it.
+- **Counterfactual check** — if this cause were removed, would the symptom above it plausibly not have happened? If removing the cause leaves the symptom standing, you've named a coincidence, not a cause. This is the main guard against monocausal storytelling — a tidy chain that reads well but wouldn't actually have prevented the failure.
+
```
Why did the release roll back?
→ The cart-checkout endpoint returned 500 on ~8% of traffic.
+ evidence: rollback log 14:02; APM 500-rate panel; error tracker grouped on CartController#checkout
+ counterfactual: no 500 spike → no rollback trigger fires → release stays up. Holds.
```
### 3. Take the Answer as the New Question