From 3916dc446c8925f64a974498d326637b34d46575 Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Fri, 22 May 2026 14:26:43 -0500 Subject: docs(skills): tighten debug, root-cause-trace, and five-whys Three audit-pass fixes across the debugging skills. debug now captures environment and recent-change context (versions, flags, dataset, seed/clock, concurrency, recent commits) as a Phase-1 step. Many intermittent bugs live in state or environment, not a local code path, and "what changed recently" is often the fastest route to the cause. root-cause-trace's defense-in-depth said to add a check at every layer that could have caught the bad value, which breeds validation spam. It now adds checks only at boundary-owning layers (ingress, persistence, the invariant owner, final render), and says a pass-through function that owns neither a boundary nor an invariant shouldn't get a duplicate null check. five-whys now makes each link carry an evidence field and a counterfactual: if you remove this cause, does the symptom above still happen? That's the guard against a tidy chain that reads well but wouldn't have prevented the failure. --- root-cause-trace/SKILL.md | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'root-cause-trace') diff --git a/root-cause-trace/SKILL.md b/root-cause-trace/SKILL.md index fad4601..bb46f41 100644 --- a/root-cause-trace/SKILL.md +++ b/root-cause-trace/SKILL.md @@ -99,13 +99,14 @@ Two actions, in order: **a. Fix at the trigger.** In the example: change the query to `INNER JOIN`, or explicitly handle customer-less orders (depending on intent). -**b. Add defense-in-depth.** For each layer between the trigger and the symptom, ask: *could this layer have caught the bad value?* If yes, add the check. +**b. Add defense-in-depth — at boundaries, not everywhere.** Don't armor every layer between the trigger and the symptom; that's validation spam that buries the real check and slows every call. Add a check only at a layer that *owns* a boundary or invariant: -- Parser/validator layer: reject rows without `customer_id` -- Service layer: throw if `order.customer` is nil instead of passing it downstream -- Formatter layer: render "Unknown customer" rather than crashing +- **Ingress / trust boundary** — where untrusted or external data first enters (parser/validator layer: reject rows without `customer_id`) +- **Persistence boundary** — before writing to or reading from a store +- **Invariant-owning layer** — the service that's supposed to guarantee a fact (throw if `order.customer` is nil rather than passing it downstream) +- **Final render/output** — degrade gracefully (render "Unknown customer" rather than crashing) -Each defense means the next time something similar happens, it surfaces earlier and with better context. The goal isn't any single check — it's that the bad value can't propagate silently. +A pass-through function that neither owns the invariant nor crosses a boundary should *not* get a duplicate null check — let the boundary layers carry it. Each boundary defense means a similar bad value surfaces earlier and with better context; the goal isn't any single check, it's that the bad value can't propagate silently past the layer responsible for it. ## Adding Instrumentation -- cgit v1.2.3