From 63b9a5f6b451f63e7f18381c47f4464bda993ead Mon Sep 17 00:00:00 2001 From: Craig Jennings Date: Thu, 11 Jun 2026 01:36:35 -0500 Subject: feat: insights follow-ups — empirical-first debugging, staging guard, invariants MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit From the 2026-06-11 usage report. The debug skill gains a hypothesis-discipline contract (rank candidate causes by cheapest empirical test, run probes in parallel, report only confirmed findings) targeting the serial theory-cycling the report flagged. commits.md's pre-commit checklist gains a staged-files guard covering the untracked-set and canonical-vs-mirror conventions. A small tracked CLAUDE.md carries the rulesets mirror invariant at turn zero. Two [#C] pilots filed: a read-only morning ops orchestrator and a monthly session-harvest workflow. The report's 500-token-cap finding was a mislabel: the underlying transcripts show 529 Overloaded and stream-idle-timeout errors with no token cap configured anywhere, so nothing to change there. --- debug/SKILL.md | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'debug') diff --git a/debug/SKILL.md b/debug/SKILL.md index 4db6bbd..1db04df 100644 --- a/debug/SKILL.md +++ b/debug/SKILL.md @@ -36,6 +36,16 @@ Decide which kind of bug this is. Each branch hands off to the appropriate speci If two branches genuinely apply (a code bug also revealed a process gap), run both — the techniques are independent and complement each other. +### Hypothesis discipline — measure before theorizing (all branches) + +When the cause isn't obvious from Phase 1's evidence, do not narrate candidate causes and pick a favorite. Instead: + +1. **List the candidate causes**, each with the *single cheapest empirical test* that would confirm or kill it (a log read, a config dump, a timed measurement, an A/B toggle). +2. **Run the cheap tests first** — independent read-only probes go out as one parallel batch (parallel subagents when the probes are heavy). Rank by cost-to-test, not by how plausible the theory sounds. +3. **Report only confirmed findings.** A hypothesis with no passing test is not a finding, and never the basis for a fix. If every candidate dies, say "cause not isolated" and widen the evidence — don't promote the least-dead theory. + +This exists because serial hypothesis-cycling (VPN? GPU? firewall? DNS?) burns user cycles on theories a one-line measurement would have killed. Extended empirical monitoring (a battery-drain log, a before/after timing) beats a clever-sounding explanation every time. + ## Phase 3 — Verify the Hypothesis Once the chosen specialist has produced a hypothesis: -- cgit v1.2.3