diff options
| -rw-r--r-- | Makefile | 3 | ||||
| -rw-r--r-- | brainstorm/SKILL.md | 191 | ||||
| -rw-r--r-- | five-whys/SKILL.md | 188 | ||||
| -rw-r--r-- | memorize/SKILL.md | 192 | ||||
| -rw-r--r-- | root-cause-trace/SKILL.md | 236 |
5 files changed, 809 insertions, 1 deletions
@@ -4,7 +4,8 @@ SHELL := /bin/bash SKILLS_DIR := $(HOME)/.claude/skills RULES_DIR := $(HOME)/.claude/rules SKILLS := c4-analyze c4-diagram debug add-tests respond-to-review review-pr fix-issue security-check \ - arch-design arch-decide arch-document arch-evaluate + arch-design arch-decide arch-document arch-evaluate \ + brainstorm memorize root-cause-trace five-whys RULES := $(wildcard claude-rules/*.md) LANGUAGES := $(notdir $(wildcard languages/*)) diff --git a/brainstorm/SKILL.md b/brainstorm/SKILL.md new file mode 100644 index 0000000..2f84aa6 --- /dev/null +++ b/brainstorm/SKILL.md @@ -0,0 +1,191 @@ +--- +name: brainstorm +description: Refine a vague idea into a validated design through structured one-question-at-a-time dialogue, diverse option exploration (three conventional + three tail samples), and chunked validation (200-300 words at a time). Produces `docs/design/<topic>.md` as the output artifact. Use when shaping a new feature, service, or workflow before implementation begins — or when a "we should probably…" idea needs to become concrete enough to build. Do NOT use for mechanical well-defined work (renames, reformats, dependency bumps), for system-level architecture choices (use arch-design), for recording a single decision that has already been made (use arch-decide), or for debugging an existing error (use root-cause-trace or debug). Synthesized from the Agentic-Context-Engineering / SDD brainstorm pattern — probabilistic diversity sampling originated there. +--- + +# Brainstorm + +Turn a rough idea into a design document concrete enough to implement. One question at a time, diverse options considered honestly, design validated in chunks. + +## When to Use + +- Shaping a new feature, component, service, or workflow before code exists +- Translating "we should probably …" into something buildable +- Converting a noticed-but-unshaped problem into a design +- Exploring whether an idea is worth building before committing to it + +## When NOT to Use + +- Mechanical, well-defined work (rename, reformat, upgrade a dependency, apply a migration) +- System-level architecture (use `arch-design`) +- Recording a specific decision already made (use `arch-decide`) +- Debugging an existing error (use `root-cause-trace` or `debug`) +- Writing code whose shape is already clear to you + +## Workflow + +Three phases. Each converges a little more. Each is validated before moving to the next. + +### Phase 1 — Understand the Idea + +Dialogue, not interrogation. Before the first question, read the project context: the relevant directory, recent commits, any existing docs, the language and stack. Ground your questions in what's already there. + +**Rules:** + +- **One question per message.** Never batch. Respondents answer the easiest question and skip the rest when you batch. +- **Prefer multiple choice.** Easier to answer, faster to skim, surfaces the option space for free. +- **Open-ended when the option space is too large to enumerate.** Then refine with follow-ups. +- **Focus the first questions on purpose, not mechanism.** Mechanism comes in phase 2. + +**Topics to cover (not a script — skip what's already clear):** + +- What problem does this solve? +- Who is the user or caller? +- What's the smallest version that would be useful? +- What's explicitly out of scope? +- What are the success criteria (measurable if possible)? +- What constraints apply — team size, timeline, existing code, stack? + +**Stop when** you can state the idea back in one sentence and the user confirms. Don't keep asking for the sake of thoroughness. + +### Phase 2 — Explore Approaches + +Before committing to a direction, generate **six candidate approaches**. Force diversity. + +**Composition:** + +- **Three conventional approaches** — the paths a reasonable engineer with relevant experience would consider. Each has a high prior probability of being right (roughly ≥80%). +- **Three tail samples** — approaches from genuinely different regions of the solution space. Each individually unlikely (roughly ≤10%), but together they expand what's been considered. These are the ones that surprise. + +Why tail samples matter: most teams converge on the first conventional option. The tail samples either reveal a better solution you hadn't considered, or they clarify *why* the conventional option is the right one. Either way, the recommendation that follows is stronger. + +**For each candidate, state:** + +- One-paragraph summary +- Honest pros +- Honest cons (no selling; if you can't name real cons, you haven't thought hard enough) + +**End with:** + +- Your recommendation +- Why — explicit mapping to the constraints and success criteria from phase 1 +- What's being traded away +- What becomes an open decision for `arch-decide` later + +### Phase 3 — Present the Design + +Once a direction is chosen, describe it in **200-300 word chunks**. After each chunk, ask "does this look right so far?" Never dump a wall of design prose — the user skims walls. + +**Typical sections, one chunk each:** + +1. **Architecture** — components, boundaries, key interfaces +2. **Data flow** — what moves through where, in what shape +3. **Persistence** — what's stored, where, for how long, with what durability +4. **Error handling** — expected failures, responses, user-facing behavior +5. **Testing approach** — what correctness means here, how it gets verified +6. **Observability** — what gets logged, measured, traced + +Skip sections that don't apply. Add domain-specific ones (auth flow, concurrency model, migration plan, rollout strategy) where relevant. + +**Be willing to back up.** If a chunk surfaces a question that invalidates an earlier chunk, revise. Committing to a wrong direction to avoid rework is the expensive path. + +## Output + +Write the validated design to `docs/design/<topic>.md`. Use this skeleton (omit sections that don't apply): + +```markdown +# Design: <topic> + +**Date:** <YYYY-MM-DD> +**Status:** Draft | Accepted | Implemented + +## Problem + +One paragraph. What, who, why now. + +## Non-Goals + +Bullet list of what this explicitly does not address. Prevents scope creep. + +## Approaches Considered + +### Recommended: <name> + +Summary, pros, cons. + +### Rejected: <name> + +Why not. Brief. + +(Include 2-3 rejected options — showing alternatives were weighed is itself valuable.) + +## Design + +### Architecture + +### Data Flow + +### Persistence + +### Error Handling + +### Testing + +### Observability + +## Open Questions + +Items that need decisions before or during implementation. + +- [ ] Question — likely candidate for an ADR via `arch-decide` +- [ ] Question — … + +## Next Steps + +- Open questions → `arch-decide` +- If this implies system-level structural change → `arch-design` +- Implementation → <agreed next action> +``` + +If `docs/design/` doesn't exist, create it. If a design with the same topic name exists, ask before overwriting. + +## Hand-Off + +After the design is written and agreed: + +- **Each open question** → run `arch-decide` to record as an ADR +- **System-level implications** → run `arch-design` to update the architecture brief +- **Implementation** → whatever the project's implementation path is (`fix-issue`, `add-tests`, or direct work) + +Link the design doc from wherever it's being implemented (PR description, ticket, commit). + +## Review Checklist + +Before declaring the design accepted: + +- [ ] Problem stated in one paragraph that a newcomer could understand +- [ ] Non-goals listed (at least 1-2) +- [ ] At least 6 approaches considered, with 3 genuinely in the tail +- [ ] Recommendation names what it's trading away +- [ ] Each design chunk was individually validated +- [ ] Open questions listed; each has a clear path forward +- [ ] User has confirmed the design reflects their intent + +## Anti-Patterns + +- **Asking three questions in one message.** The user answers the easiest. Ask one. +- **Leading questions.** "Don't you think Postgres is right here?" isn't exploration. +- **Skipping tail samples.** If all six options are minor variations on the conventional answer, diversity wasn't actually attempted. +- **Dumping the whole design at once.** Eight hundred words without validation checkpoints means the user skims and misses the thing they'd want to push back on. +- **Hiding trade-offs.** Every approach has real cons. State them or the evaluation is theatre. +- **"We'll figure it out in implementation."** If a design question is ducked here, it becomes a larger, costlier problem during coding. Resolve now or explicitly defer with an ADR. +- **Over-specifying.** The design should be detailed enough to implement, not so detailed it's already the implementation. If you're writing function bodies in the design, stop. + +## Key Principles + +- **One question at a time.** Non-negotiable. +- **Multiple choice beats open-ended** when the option space is small enough to enumerate. +- **Six approaches, three in the tail.** Discipline around diversity. +- **Validate in chunks.** 200-300 words, then check. +- **YAGNI ruthlessly.** Remove anything not justified by the problem statement. +- **Back up when needed.** Revising early beats committing to wrong. diff --git a/five-whys/SKILL.md b/five-whys/SKILL.md new file mode 100644 index 0000000..3549222 --- /dev/null +++ b/five-whys/SKILL.md @@ -0,0 +1,188 @@ +--- +name: five-whys +description: Drive iterative "why?" questioning from an observed problem to its actual root cause, then propose fixes that target the root rather than the symptom. Default depth is five, but the real stop condition is reaching a cause that, if eliminated, would prevent every observed symptom in the chain — that may take three whys or eight. Handles branching (multiple contributing causes, each explored separately), validates the chain by working backward from root to symptom, and rejects "human error" as a terminal answer (keep asking why the process allowed that error). Use for process, decision, and organizational failures — missed code reviews, recurring incidents, slow deploys, flaky releases, "we've fixed this three times already" problems. Do NOT use for debugging a stack trace (use root-cause-trace, which walks the call chain), for tactical defect investigation where the fix site is local and obvious, or for blame attribution (the skill refuses to terminate at a person). Companion to root-cause-trace — that's for code execution; this is for process/decision root-causes. +--- + +# Five Whys + +A structured way to drive from a visible symptom to the underlying cause, and from there to a fix that prevents the whole class rather than the single instance. Simple, old, still works — provided the stop condition is real root cause, not the fifth why. + +## Core Idea + +Start with what happened. Ask why it happened. The answer becomes the next "what." Ask why again. Repeat until further questioning stops producing new information — that's the root. Then fix *there*, not at the symptom. + +Five is a convention, not a quota. Some chains terminate at three; others need eight. The depth isn't the point; reaching a cause that, if eliminated, prevents every symptom in the chain is the point. + +## When to Use + +- Process failures: a code review missed something important; a deploy broke prod; a release was late +- Recurring incidents: "we've fixed this three times already" +- Organizational / workflow issues: CI is slow; PRs pile up; on-call gets paged at the same time of night +- Decision post-mortems: a choice was made that later proved wrong; why was that choice available / appealing / undetected? + +## When NOT to Use + +- Debugging a stack trace (use `root-cause-trace` — different technique, walks the call chain, not the causal chain) +- Local defect fixes where the fix site is obvious (just fix it) +- Blame-finding exercises — this skill actively rejects "human error" as a terminal answer +- Situations where there's genuinely no causal chain (a truly random hardware failure doesn't have five whys) + +## Workflow + +### 1. State the Problem Precisely + +Fuzzy problems produce fuzzy answers. Write the observed fact in one sentence. Include when, what, measurable impact. + +Good: "The 2026-04-17 release was rolled back at 14:02 after the cart-checkout endpoint returned 500 on ~8% of traffic." + +Bad: "Our releases are flaky." + +### 2. Ask Why — One Answer + +Not three possible answers. One best-supported answer, based on evidence you can point to. If the question genuinely has multiple independent causes, you'll branch in step 4. + +``` +Why did the release roll back? + → The cart-checkout endpoint returned 500 on ~8% of traffic. +``` + +### 3. Take the Answer as the New Question + +``` +Why did cart-checkout return 500? + → The database query for inventory timed out under load. + +Why did it time out under load? + → Query plan changed; a new index wasn't picked up. + +Why wasn't the index picked up? + → The query uses a function on an indexed column; the index wasn't a functional index. + +Why was that only a problem now? + → Traffic crossed the threshold where the non-functional scan dominates. +``` + +Four whys in; we've gone from "release rolled back" to "missing functional index for an 8-month-old query." + +### 4. Branch When Honest + +Some chains fork. When an answer is *and* rather than *just*, explore each branch separately. + +``` +Why wasn't this caught in testing? + → (a) Staging load is 1/100 of production + → (b) The index was added during a migration that didn't run in staging + +Pursue (a): Why is staging load so low? + → Staging runs on cheaper infra; no one funded realistic load testing. + → Organizational: load-test budget not in ops budget. + +Pursue (b): Why did the migration not run in staging? + → Migrations run manually per-env; nobody ran it in staging. + → Process: manual migrations are fragile; no automation requires them pre-prod. +``` + +Each branch gets its own root. The eventual fix usually touches more than one. + +### 5. Validate by Walking Backward + +Once a chain looks complete, walk it from root to symptom. If each step produces the next, the chain is sound. + +``` +No functional index on an indexed-column function + → query plan scans the column under load + → query times out under load + → cart-checkout returns 500 + → release rolls back +``` + +Every arrow should read as "therefore." If one doesn't, the chain has a gap — a why you skipped or an answer that wasn't actually the cause. + +### 6. Reject Convenient Stops + +Stop conditions that seem terminal but aren't: + +- **"Human error."** Always has a why behind it. *Why* did the system allow the human error? *Why* was the error easy to make? "Alice forgot to run the migration" isn't root — the root is "migration execution was manual, undocumented, and unchecked." +- **"Budget / staffing."** Often correct at *some* level, but almost never the deepest cause. *Why* wasn't this funded? *Why* wasn't it prioritized? The answer may be organizational (we don't track incident cost well) or informational (the bug's frequency wasn't visible). +- **"Just a one-off."** If it happened once, you don't need this skill. You're using this skill because the problem is patterned. + +Keep asking until the cause is a missing mechanism (a check, an automation, a doc, a visibility tool) — not a person and not a shortfall of will. + +### 7. Propose Fixes at the Root + +For each root cause found, propose a fix that's structural. Good fixes are almost always one of: + +- A **check** added to a pipeline that would have caught the issue +- An **automation** that removes a manual step where the error occurred +- A **documentation / onboarding change** that makes the requirement discoverable +- A **visibility tool** (dashboard, metric, alert) that would have surfaced the problem earlier + +Fixes that treat symptoms (adding a try/except, a retry, a "remember to X" in the runbook without a mechanism to enforce X) are not root-cause fixes — they're patches. + +## Output + +Produce a short markdown report: + +```markdown +## Five Whys: <incident / problem name> + +**Problem:** <one-sentence precise statement> + +**Chain:** + +1. Why did X happen? + → <answer> +2. Why did that happen? + → <answer> +3. ... +4. Root: <root cause> + +**Branches** (if any): + +- Branch A: <fork question> → ... → Root A +- Branch B: <fork question> → ... → Root B + +**Validation (backward):** + +Root → step N → step N-1 → … → symptom. Each step follows. + +**Root-cause fixes:** + +- [ ] <Structural fix at Root A> — who / when +- [ ] <Structural fix at Root B> — who / when + +**Not fixes (patches to avoid):** + +- <what NOT to do as a substitute for root-cause work> +``` + +## Anti-Patterns + +- **Stopping at exactly five whys.** Five is a convention, not a stop condition. Stop when further questioning yields no new information. +- **Accepting "human error" as root.** Always has a deeper why — the system that allowed the error. +- **Skipping a why because the answer feels obvious.** Write it down. Obvious answers sometimes reveal themselves as wrong when explicit. +- **One answer per why when honest branching is present.** Fork when the chain really forks. +- **Fix that adds a runbook entry.** Runbook entries without enforcement are a wish, not a fix. +- **Blaming the tool or vendor.** Sometimes true, but usually there's a why upstream about why the tool was chosen or why its behavior wasn't validated. +- **Confusing this with a stack-trace walk.** Different technique, different skill (`root-cause-trace`). This one is for process and decision chains, not code execution chains. + +## Related Patterns + +- **Plan → Do → Check → Act** — iterative improvement cycle after root-cause work; useful for systemic changes that can't be one-shot. Not covered by this skill; do the PDCA manually when the root-cause fix is itself a multi-week program. +- **Cause-and-effect / Ishikawa diagram** — useful when a problem has many contributing factors and you need to categorize before drilling. Five whys works well *inside* each branch of an Ishikawa. +- **`root-cause-trace`** — the code-execution counterpart. Use that for stack-trace bugs; use this for process / decision bugs. + +## Hand-Off + +- **Root-cause fix is structural and well-scoped** → implement it; add tests / checks / automation as relevant +- **Root-cause fix is a multi-week program** → open a tracking issue; assign; consider applying PDCA cycles to its rollout +- **Several whys revealed preserveable insights** → run `memorize` to capture the patterns +- **Root is architectural** → run `arch-evaluate` to see if the same structural gap exists elsewhere + +## Key Principles + +- **Precision in the problem statement.** One sentence. Measurable. When, what, how bad. +- **One best-supported answer per why** — or an honest branch, not a shrug. +- **Keep asking until the cause is a missing mechanism, not a person.** +- **Validate backwards.** "Therefore" should read plausibly at every step. +- **Fix at the root.** Structural changes, not patches. diff --git a/memorize/SKILL.md b/memorize/SKILL.md new file mode 100644 index 0000000..f77540f --- /dev/null +++ b/memorize/SKILL.md @@ -0,0 +1,192 @@ +--- +name: memorize +description: Curate concrete, actionable insights into the project's `CLAUDE.md` so they survive across sessions and compound over time. Harvests recent-session learnings (patterns that worked, anti-patterns that bit, API gotchas, verification checks), filters against quality gates (atomic, evidence-backed, non-redundant, verifiable, safe), and writes into a dedicated `## Memorized Insights` section rather than scattering entries. Use after a productive session, a bug fix that revealed a non-obvious pattern, or an explicit review where you want learnings preserved. Supports `--dry-run` to preview, `--max=N` to cap output, `--target=<path>` to write elsewhere, `--section=<name>` to override the destination section. Flags insights that look cross-project and suggests promotion to `~/code/rulesets/claude-rules/` instead. Do NOT use for session wrap-up / progress summaries (not insights), for private personal context (auto-memory handles that, not a tracked file), or for rules that belong in a formal rules file (those go under `.claude/rules/`). Synthesis of Agentic Context Engineering (ACE, arXiv:2510.04618) — grow-and-refine without context collapse. +--- + +# Memorize + +Turn transient session insights into durable, actionable entries in the project's `CLAUDE.md`. The goal: each run of this skill should make the next session measurably better, without diluting what's already there. + +## When to Use + +- End of a productive session where you want concrete patterns preserved +- After a bug fix that revealed a non-obvious constraint or gotcha +- After a review where a pattern was identified as worth repeating (or avoiding) +- When you notice yourself re-deriving the same insight — it belongs written down + +## When NOT to Use + +- For a session wrap-up summary (that's narrative, not an insight) +- For personal/private context (auto-memory at `~/.claude/projects/<cwd>/memory/` captures that — see below) +- For formal project rules — those belong in `.claude/rules/*.md` +- When you have no specific, evidence-backed insight yet — skip rather than fabricate + +## Relationship to Other Memory Systems + +Three distinct systems, zero overlap: + +| System | Location | Scope | Purpose | +|---|---|---|---| +| **auto-memory** | `~/.claude/projects/<cwd>/memory/` | Private, per-working-directory | Session-bridging context about the user and project (feedback, user traits, project state). Written continuously by the agent. | +| **`/memorize` (this skill)** | Project `CLAUDE.md` | Public, tracked, per-project | Explicit, curated rules and patterns. Written deliberately by the user invocation. | +| **Formal rules** | `.claude/rules/*.md`, `~/code/rulesets/claude-rules/` | Public, tracked, per-project or global | Stable policy (style, conventions, verification). Authored once, rarely updated. | + +Use the right system for the right content. + +## Workflow + +Four phases. Each can be skipped if it has no content; none should be silently merged. + +### Phase 1 — Harvest + +Identify candidate insights from recent work. Look at: + +- The session transcript (or files referenced by `--source`) +- Recent commits and their messages +- Any `.architecture/evaluation-*.md` from `arch-evaluate` +- Reflection or critique outputs if they exist +- Anti-patterns you caught yourself falling into + +**Extract only:** + +- **Patterns that worked** — preferably with a minimum precondition and a worked example or reference +- **Anti-patterns that bit** — with the observable symptom and the reason +- **API / tool gotchas** — auth quirks, rate limits, idempotency, error codes +- **Verification items** — concrete checks that would catch regressions next time +- **Specific thresholds** — "pagination above 50 items" not "pagination when needed" + +**Exclude:** + +- Progress narrative ("today we shipped X") +- Personal preferences ("I like functional style") +- Vague aphorisms ("write good code") +- Unverified claims (if you can't cite code, docs, or repeated observation, skip) + +### Phase 2 — Filter (Grow-and-Refine) + +For each candidate insight, apply these gates. Fail any → drop the entry. + +- **Actionable.** A reader could apply this immediately. "Write good code" fails; "For dataset lookups under ~100 items, Object outperforms Map in V8" passes. +- **Specific.** Names a threshold, a file, a flow, a version, or a named tool. Generic insights are noise. +- **Evidence-backed.** Derived from code you just read, docs you just verified, or a pattern observed more than once. Speculation doesn't count. +- **Atomic.** One idea per bullet. If the insight has two distinct parts, it's two bullets. +- **Non-redundant.** Check existing `CLAUDE.md` content. If something similar exists, prefer merging or skipping over duplicating. If the new one is genuinely more specific and evidence-backed than the existing one, append it and mark the older one with `(candidate for consolidation)` — don't auto-delete prior user content. +- **Safe.** No secrets, tokens, private URLs, or PII. Nothing that would leak in a public commit. +- **Stable.** Prefer patterns that'll remain valid. If version-specific, say so. + +### Phase 3 — Write + +Write approved insights to a dedicated section of `CLAUDE.md`. Default section name: **`## Memorized Insights`**. Override with `--section=<name>`. + +**Discipline:** + +- **One section only.** Don't scatter entries across CLAUDE.md. All memorized content in one place means future `/memorize` runs and human readers find it fast. +- **Create the section if absent.** Place it near the end of CLAUDE.md, before any footer links. +- **Preserve chronology within the section.** Newer entries appended; don't shuffle. +- **Include provenance.** Each entry gets a date and, where useful, a one-word source hint (`pattern:`, `gotcha:`, `threshold:`, `anti-pattern:`, `verify:`). + +**Entry format:** + +```markdown +- **<short title or rule>.** <One or two sentences. Concrete. Actionable.> (`<source-hint>` — YYYY-MM-DD) +``` + +Examples: + +```markdown +- **Pagination threshold.** Fetch endpoints returning >50 items must paginate; clients assume everything ≤50 is complete. (`threshold` — 2026-04-19) +- **Map vs Object for small lookups.** In V8, Object outperforms Map for <~100 keys; Map wins at 10k+. Use Object for hot config lookups. (`pattern` — 2026-04-19) +- **Never log `load-file-name` from batch-compile context.** Both `load-file-name` and `buffer-file-name` are nil during top-level evaluation; `file-name-directory nil` raises `stringp, nil`. (`gotcha` — 2026-04-19) +``` + +### Phase 4 — Validate + +After writing, check: + +- [ ] Every entry passed all Phase 2 gates +- [ ] Each entry is atomic (one idea) +- [ ] No near-duplicates were created +- [ ] The `## Memorized Insights` section is coherent — entries flow, categories aren't interleaved randomly + +If the validation surfaces a problem, fix before exiting. + +## Cross-Project Promotion + +Some insights apply to *all* the user's projects, not just this one. Examples: + +- "Always emit JSON with a stable key order for git diffs" +- "For TypeScript libraries, expose types via `package.json#exports`" + +When an insight reads as general rather than project-specific, the skill emits a **promotion hint** at the end of the run: + +``` +Promotion candidates: +- "JSON stable key order" — reads as general. Consider adding to: + ~/code/rulesets/claude-rules/style.md + (would apply to every project via global install) + +Keep as project-specific, promote, or drop? [k/p/d] +``` + +Promotion happens manually — the skill doesn't edit the rulesets repo automatically. The hint is a nudge to think about scope. + +## Arguments + +- **`--dry-run`** — show the proposed entries and where they'd be written; do not modify any files. +- **`--max=N`** — cap output to the top N insights by specificity + evidence. +- **`--target=<path>`** — write to a different file. Defaults to `./CLAUDE.md`. Use e.g. `docs/learnings.md` if the project prefers a separate file. +- **`--section=<name>`** — override the default `## Memorized Insights` section name. +- **`--source=<spec>`** — scope what gets harvested. Values: `last` (most recent message), `selection` (a user-highlighted region if supported), `chat:<id>` (a specific past conversation), `commits:<range>` (e.g., `commits:HEAD~10..`). Defaults to a reasonable window of recent session context. + +## Output + +On a real run (not `--dry-run`): + +1. Short summary — "added N entries to `<target>`: X patterns, Y gotchas, Z thresholds." +2. Any promotion candidates flagged for global-rules consideration. +3. Confirmation of the file path modified. + +On `--dry-run`: + +1. Preview of each proposed entry with the section it would land in. +2. Flagged promotions. +3. Explicit confirmation nothing was written. + +## Anti-Patterns + +- **Summarizing the session instead of extracting insights.** "Today we refactored X" is narrative. "X's public API requires parameters in Y order due to Z" is an insight. +- **Writing entries without evidence.** If you can't point to code, docs, or multiple observations, the entry is speculation. +- **Overwriting prior content.** Mark conflicts for consolidation; don't auto-delete what the user wrote. +- **Scattering entries.** One section. Grep-able. Coherent. +- **Batch-writing 20 entries in one session.** If the session generated 20 real insights, many of them aren't. Filter harder. 3-5 genuine entries per run is typical. +- **Adding to `CLAUDE.md` when auto-memory is the right system.** Private user-context goes in auto-memory; public project rules go here; static policy goes in `.claude/rules/`. +- **Promoting too eagerly.** "This applies to all projects" is a strong claim. If you can't name three unrelated projects where the rule would fire, it's project-specific. + +## Review Checklist + +Before accepting the additions: + +- [ ] Each entry has a source hint and a date +- [ ] Each entry passes the atomic / specific / evidence-backed / non-redundant / safe / stable gates +- [ ] The `## Memorized Insights` section exists exactly once in the target +- [ ] Promotion candidates (if any) have been flagged, not silently promoted +- [ ] `--dry-run` was used at least once if the target file is under active template management (e.g. bundled CLAUDE.md from rulesets install) + +## Maintenance + +`CLAUDE.md` accumulates over time. Periodically: + +- **Consolidate.** Entries marked `(candidate for consolidation)` deserve a look. Often the older entry is superseded; sometimes it's a special case of the newer one. +- **Retire.** Entries about deprecated tools or obsolete versions should be removed explicitly (with a commit message noting the retirement). +- **Promote.** Re-scan for entries that have started firing across multiple projects — those belong in `~/code/rulesets/claude-rules/`. +- **Trim.** If the section grows past ~40 entries, either the project is complex enough to warrant splitting `CLAUDE.md` into multiple files, or the entries haven't been curated aggressively enough. + +## Theoretical Background + +The grow-and-refine / evidence-backed approach draws on Agentic Context Engineering (Zhang et al., *Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models*, arXiv:2510.04618). Key ideas borrowed: + +- **Generation → Reflection → Curation** as distinct phases, not a single compression step +- **Grow-and-refine** — accumulate granular knowledge rather than over-compressing to vague summaries +- **Avoiding context collapse** — resist the temptation to rewrite old entries into smoother prose; specificity is the value + +This skill implements the Curation phase. Reflection and Generation happen in the main conversation. diff --git a/root-cause-trace/SKILL.md b/root-cause-trace/SKILL.md new file mode 100644 index 0000000..210ee69 --- /dev/null +++ b/root-cause-trace/SKILL.md @@ -0,0 +1,236 @@ +--- +name: root-cause-trace +description: Given an error that manifests deep in the call stack, trace backward through the call chain to find the original trigger, then fix at the source and add defense-in-depth at each intermediate layer. Covers the backward-trace workflow (observe symptom → identify immediate cause → walk up the chain → find origin → fix + layer defenses), when and how to add instrumentation (stack capture before the dangerous operation, not after), and the bisection pattern for identifying which test pollutes shared state. Use when an error appears in the middle or end of an execution path, when a stack trace shows a long chain, when invalid data has unknown origin, or when a failure reproduces inconsistently across runs. Do NOT use for clear local bugs where the fix site is obvious (just fix it), for design-level root-cause analysis of processes/decisions (use five-whys instead), for performance regressions (different class of investigation), or when there's no symptom yet to trace from. Companion to the general `debug` skill — `debug` is broader; `root-cause-trace` is specifically the backward-walk technique. +--- + +# Root-Cause Trace + +Bugs often surface far from their origin. The instinct is to patch where the error appears. That usually hides the real problem and leaves the next victim to find it again. This skill documents the systematic backward walk to the actual trigger, plus the defense layers that make the bug impossible rather than merely fixed. + +## Core Principle + +**Trace backward through the call chain until you find the original trigger. Fix at the source. Add validation at each intermediate layer that could have caught it.** + +Fixing only the symptom creates debt. The same invalid value will flow into the next downstream call site eventually. + +## When to Use + +- The error appears deep in execution, not at the entry point +- A stack trace shows a long chain and it's unclear which level introduced the problem +- Invalid data's origin is unknown — "how did this null / empty / wrong value get here?" +- A failure reproduces inconsistently and seems to depend on earlier state +- You catch yourself about to "just add a null check" without knowing why it's null + +## When NOT to Use + +- Simple, local bugs where the fix site is obvious (just fix it; don't over-engineer) +- Root-cause analysis of processes, decisions, or organizational issues (use `five-whys`) +- Performance regressions — different kind of investigation (profiling, benchmarking) +- You have no concrete symptom yet; you're still hunting for the bug — use general `debug` first + +## Workflow + +Five steps. Don't skip any; each changes what the next is looking for. + +### 1. Observe the Symptom + +State the symptom precisely. Paste the error. Note where it occurred. + +``` +Observed: `TypeError: Cannot read property 'name' of undefined` + at src/service/order.ts:142 in formatOrderLine() +``` + +Precision matters because the next step depends on *exactly* what failed. + +### 2. Identify the Immediate Cause + +What code directly produced the error? Not the root cause — the thing right before the failure. + +``` +formatOrderLine(order) { + return `${order.customer.name} — ${order.total}`; // ← here +} +``` + +Immediate cause: `order.customer` is undefined. The formatting line runs on a malformed order. + +### 3. Walk the Call Chain Up + +Ask: *what called this, and with what arguments?* + +``` +formatOrderLine(order) ← called from renderInvoice(orderList) + → orderList came from fetchOrders() + → fetchOrders() came from the cron job handler + → the cron job handler received an empty list and mapped over it +``` + +At each level, note: + +- Which function called the next +- What argument / value was passed +- Was the value wrong at this level or did it look OK? + +Keep walking until you find the point where the value *became* wrong. + +### 4. Find the Original Trigger + +The trigger is the point where reality diverged from the expected state. Examples: + +- A config default was empty, and nothing validated it +- A test fixture was accessed before its setup hook ran +- An API response changed shape and no parser rejected the old shape +- A database row was written in a partial state + +State the trigger in one sentence: + +``` +Trigger: `orderList` was `[undefined, ...]` because the database query used +`LEFT JOIN` where `INNER JOIN` was intended; customer-less orders entered the +result set for the first time after the Oct 2 schema change. +``` + +Now you understand the bug. + +### 5. Fix at the Source, Then Add Defense + +Two actions, in order: + +**a. Fix at the trigger.** In the example: change the query to `INNER JOIN`, or explicitly handle customer-less orders (depending on intent). + +**b. Add defense-in-depth.** For each layer between the trigger and the symptom, ask: *could this layer have caught the bad value?* If yes, add the check. + +- Parser/validator layer: reject rows without `customer_id` +- Service layer: throw if `order.customer` is nil instead of passing it downstream +- Formatter layer: render "Unknown customer" rather than crashing + +Each defense means the next time something similar happens, it surfaces earlier and with better context. The goal isn't any single check — it's that the bad value can't propagate silently. + +## Adding Instrumentation + +When the chain is long or the trigger is hard to identify, add stack-capturing logs **before** the suspect operation, not after it fails. + +Capture, at minimum: + +- The value being operated on +- The current working directory / environment / relevant globals +- A full stack trace via the language's native mechanism + +Language examples: + +```typescript +// Before calling a function that might fail +function callDangerousOp(arg: string) { + console.error('TRACE dangerous-op entry', { + arg, + cwd: process.cwd(), + nodeEnv: process.env.NODE_ENV, + stack: new Error().stack, + }); + return dangerousOp(arg); +} +``` + +```python +import traceback, logging + +def call_dangerous_op(arg): + logging.error( + "TRACE dangerous-op entry: arg=%r cwd=%s\n%s", + arg, os.getcwd(), ''.join(traceback.format_stack()) + ) + return dangerous_op(arg) +``` + +```go +func callDangerousOp(arg string) error { + log.Printf("TRACE dangerous-op entry: arg=%q stack=%s", arg, debug.Stack()) + return dangerousOp(arg) +} +``` + +**Tactical rules:** + +- **Use stderr / equivalent, not framework loggers.** In tests, the framework logger is often suppressed or buffered; stderr survives. +- **Log before the dangerous operation, not after.** If the op crashes, post-call logs never fire. +- **Include enough context.** The value alone isn't enough; you need cwd, relevant env, and the full stack. +- **Name the trace uniquely.** `grep TRACE dangerous-op` should find only these lines. + +## Identifying Test Pollution + +When a failure appears during a test run but the offending test is unclear, bisect: + +1. Run tests one at a time, in file order, until the failure first appears +2. The failure's first-appearance test is either the polluter, or ran after the polluter and observed the polluted state + +A small shell helper can automate: + +```bash +# For each test file, run it and check for the pollution symptom. +# Replace the symptom-check with something specific to your failure. +for t in tests/test-*.el; do + clean_state + run_test "$t" >/dev/null 2>&1 + if symptom_present; then + echo "First symptom after: $t" + break + fi +done +``` + +Once identified, trace backward (steps 2-4 above) from the polluting operation. + +## Real-World Patterns + +- **Empty string as path.** An uninitialized or early-read config value often presents as `""`. Many system calls silently treat `""` as `.` or the current directory. Symptom: file appears in an unexpected place. Trigger: a getter was accessed before its initializer ran. +- **Stale cache / wrong TTL.** Symptom: new code behaves like old code. Trigger: a cached value from before the change is still live. +- **Partial write / torn state.** Symptom: data looks half-correct. Trigger: a multi-step write wasn't atomic and crashed between steps. +- **Fixture access ordering.** Symptom: test fails only when run alone or only when run with others. Trigger: a fixture is read before its setup hook or mutated by a prior test. + +When you recognize one of these shapes, you can shortcut the trace — but *verify* the shape before committing to the pattern-match. + +## Anti-Patterns + +- **Catching the error and returning a safe default.** Now every caller gets sanitized output without knowing the upstream bug exists. Defense-in-depth means *logging and surfacing* the bad value, not silencing it. +- **Fixing at the symptom site.** You treated a symptom; the next invocation of the same flow fails the same way. +- **Adding a null check at every layer.** That's defense-by-armor, not defense-by-validation. Each check should reject and report, not coerce and hide. +- **Logging after the failure.** Post-call logs don't fire when the call crashes. Log before. +- **Using the framework logger in tests.** Buffered, sometimes redirected, often invisible. Use stderr. +- **Stopping at the first "cause" that seems plausible.** Keep asking "what made *that* happen?" until you reach the real trigger, not a convenient intermediate. + +## Output + +After the trace, produce a concise summary: + +``` +## Root Cause Trace: <short name> + +**Symptom:** <what the user saw> +**Immediate cause:** <failing operation + direct reason> +**Trace chain:** + 1. formatOrderLine called with undefined customer + 2. renderInvoice passed the order through unchecked + 3. fetchOrders returned a list including customer-less orders + 4. Query used LEFT JOIN where INNER JOIN was intended (trigger) + +**Fix at source:** changed query to INNER JOIN; added test for schema change. +**Defenses added:** + - parser rejects rows without customer_id + - service layer throws on null customer + - formatter renders "Unknown" instead of crashing + +**Memorize-worthy insight:** <if any> +``` + +The last line is a hand-off to `memorize` if the pattern is worth preserving. + +## Hand-Off + +- If the trace revealed a pattern worth preserving → run `memorize` +- If the trace revealed a process/decision failure (a check that should have existed in CI, a review that should have caught it) → run `five-whys` on the process +- If the trace revealed an architectural violation (layer boundary crossed, contract broken) → run `arch-evaluate` to see if other places have the same issue + +## Key Principle — Restated + +**Never fix just where the error appears.** Trace back. Fix at the trigger. Make the bug impossible rather than merely absent in this one path. |
