aboutsummaryrefslogtreecommitdiff
path: root/debug
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2026-06-11 01:36:35 -0500
committerCraig Jennings <c@cjennings.net>2026-06-11 01:36:35 -0500
commit63b9a5f6b451f63e7f18381c47f4464bda993ead (patch)
tree96bee32201ddecfb15cd3711833bd395ab849b63 /debug
parentdf948363e130e19315f28e36d909f2e4cd28627b (diff)
downloadrulesets-63b9a5f6b451f63e7f18381c47f4464bda993ead.tar.gz
rulesets-63b9a5f6b451f63e7f18381c47f4464bda993ead.zip
feat: insights follow-ups — empirical-first debugging, staging guard, invariants
From the 2026-06-11 usage report. The debug skill gains a hypothesis-discipline contract (rank candidate causes by cheapest empirical test, run probes in parallel, report only confirmed findings) targeting the serial theory-cycling the report flagged. commits.md's pre-commit checklist gains a staged-files guard covering the untracked-set and canonical-vs-mirror conventions. A small tracked CLAUDE.md carries the rulesets mirror invariant at turn zero. Two [#C] pilots filed: a read-only morning ops orchestrator and a monthly session-harvest workflow. The report's 500-token-cap finding was a mislabel: the underlying transcripts show 529 Overloaded and stream-idle-timeout errors with no token cap configured anywhere, so nothing to change there.
Diffstat (limited to 'debug')
-rw-r--r--debug/SKILL.md10
1 files changed, 10 insertions, 0 deletions
diff --git a/debug/SKILL.md b/debug/SKILL.md
index 4db6bbd..1db04df 100644
--- a/debug/SKILL.md
+++ b/debug/SKILL.md
@@ -36,6 +36,16 @@ Decide which kind of bug this is. Each branch hands off to the appropriate speci
If two branches genuinely apply (a code bug also revealed a process gap), run both — the techniques are independent and complement each other.
+### Hypothesis discipline — measure before theorizing (all branches)
+
+When the cause isn't obvious from Phase 1's evidence, do not narrate candidate causes and pick a favorite. Instead:
+
+1. **List the candidate causes**, each with the *single cheapest empirical test* that would confirm or kill it (a log read, a config dump, a timed measurement, an A/B toggle).
+2. **Run the cheap tests first** — independent read-only probes go out as one parallel batch (parallel subagents when the probes are heavy). Rank by cost-to-test, not by how plausible the theory sounds.
+3. **Report only confirmed findings.** A hypothesis with no passing test is not a finding, and never the basis for a fix. If every candidate dies, say "cause not isolated" and widen the evidence — don't promote the least-dead theory.
+
+This exists because serial hypothesis-cycling (VPN? GPU? firewall? DNS?) burns user cycles on theories a one-line measurement would have killed. Extended empirical monitoring (a battery-drain log, a before/after timing) beats a clever-sounding explanation every time.
+
## Phase 3 — Verify the Hypothesis
Once the chosen specialist has produced a hypothesis: