From 63b9a5f6b451f63e7f18381c47f4464bda993ead Mon Sep 17 00:00:00 2001
From: Craig Jennings <c@cjennings.net>
Date: Thu, 11 Jun 2026 01:36:35 -0500
Subject: feat: insights follow-ups — empirical-first debugging, staging guard,
 invariants
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

From the 2026-06-11 usage report. The debug skill gains a hypothesis-discipline contract (rank candidate causes by cheapest empirical test, run probes in parallel, report only confirmed findings) targeting the serial theory-cycling the report flagged. commits.md's pre-commit checklist gains a staged-files guard covering the untracked-set and canonical-vs-mirror conventions. A small tracked CLAUDE.md carries the rulesets mirror invariant at turn zero. Two [#C] pilots filed: a read-only morning ops orchestrator and a monthly session-harvest workflow.

The report's 500-token-cap finding was a mislabel: the underlying transcripts show 529 Overloaded and stream-idle-timeout errors with no token cap configured anywhere, so nothing to change there.
---
 debug/SKILL.md | 10 ++++++++++
 1 file changed, 10 insertions(+)

(limited to 'debug')

diff --git a/debug/SKILL.md b/debug/SKILL.md
index 4db6bbd..1db04df 100644
--- a/debug/SKILL.md
+++ b/debug/SKILL.md
@@ -36,6 +36,16 @@ Decide which kind of bug this is. Each branch hands off to the appropriate speci
 
 If two branches genuinely apply (a code bug also revealed a process gap), run both — the techniques are independent and complement each other.
 
+### Hypothesis discipline — measure before theorizing (all branches)
+
+When the cause isn't obvious from Phase 1's evidence, do not narrate candidate causes and pick a favorite. Instead:
+
+1. **List the candidate causes**, each with the *single cheapest empirical test* that would confirm or kill it (a log read, a config dump, a timed measurement, an A/B toggle).
+2. **Run the cheap tests first** — independent read-only probes go out as one parallel batch (parallel subagents when the probes are heavy). Rank by cost-to-test, not by how plausible the theory sounds.
+3. **Report only confirmed findings.** A hypothesis with no passing test is not a finding, and never the basis for a fix. If every candidate dies, say "cause not isolated" and widen the evidence — don't promote the least-dead theory.
+
+This exists because serial hypothesis-cycling (VPN? GPU? firewall? DNS?) burns user cycles on theories a one-line measurement would have killed. Extended empirical monitoring (a battery-drain log, a before/after timing) beats a clever-sounding explanation every time.
+
 ## Phase 3 — Verify the Hypothesis
 
 Once the chosen specialist has produced a hypothesis:
-- 
cgit v1.2.3