docs(commands): make arch-evaluate findings honest about certainty

Two audit fixes. Framework-agnostic findings (Claude reading import graphs) now carry a confidence level (High/Medium/Low) and how it was determined, with a required "not fully checked because" note when scale or dynamic imports cap certainty, so a partial read isn't presented as exhaustive. Unconfigured language tools are no longer skipped silently: each detected language whose tool didn't run gets an Info finding, so the audit shows what was and wasn't verified.
author: Craig Jennings <c@cjennings.net> 2026-05-22 15:03:30 -0500
committer: Craig Jennings <c@cjennings.net> 2026-05-22 15:03:30 -0500
commit: 8f14e8131d691b2026f6c82a13cfd10ccb350892 (patch)
tree: 5549c091382f85276c45aa85e0a188a737309265
parent: b43ea88a539b4965e29fdd7e10fb432cac2d3fbd (diff)
download: rulesets-8f14e8131d691b2026f6c82a13cfd10ccb350892.tar.gz
rulesets-8f14e8131d691b2026f6c82a13cfd10ccb350892.zip
1 files changed, 30 insertions, 3 deletions
diff --git a/.claude/commands/arch-evaluate.md b/.claude/commands/arch-evaluate.md
index 5ed4450..d469833 100644
--- a/.claude/commands/arch-evaluate.md
+++ b/.claude/commands/arch-evaluate.md
@@ -37,7 +37,7 @@ If the brief is missing, stop and tell the user to run `arch-design` first. Do n
 1. **Load the brief and ADRs.** Extract: declared paradigm, layers (if any), forbidden dependencies, module boundaries, API contracts that matter.
 2. **Detect repo languages and linters.** Inspect `package.json`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.
 3. **Run framework-agnostic checks.** Always. These never need tooling.
-4. **Run language-specific tools if configured.** Opportunistic — only if the repo already has `.dependency-cruiser.cjs`, `.importlinter`, `.golangci.yml` with import rules, etc. Never install tooling.
+4. **Run language-specific tools if configured.** Opportunistic — only if the repo already has `.dependency-cruiser.cjs`, `.importlinter`, `.golangci.yml` with import rules, etc. Never install tooling. For each detected language whose tool isn't configured (or can't run), emit an Info "tool not configured / not run" finding rather than skipping silently — the reader needs to see which checks were verified and which weren't.
 5. **Combine findings.** Deduplicate across sources. Label each finding with provenance (native / tool).
 6. **Produce report.** Severity-sorted markdown at `.architecture/evaluation-<date>.md`.
 
@@ -45,6 +45,18 @@ If the brief is missing, stop and tell the user to run `arch-design` first. Do n
 
 These work on any language. Claude reads the code and applies the policy from the brief.
 
+### Confidence and Provenance
+
+Framework-agnostic findings come from Claude reading import graphs and comparing public APIs by eye. That read can be incomplete: runtime imports, reflection, dynamic dispatch, and metaprogramming hide edges that a static read of the source never sees, and in a large codebase the graph may be too big to walk in full. Don't present a partial read as exhaustive.
+
+Tag every framework-agnostic finding with a confidence level and how it was determined:
+
+- **High** — the edge or signature was read directly from source and the relevant graph was walked in full.
+- **Medium** — read directly, but the surrounding graph was sampled rather than walked end to end, or the language permits dynamic edges that weren't ruled out.
+- **Low** — inferred from a partial read; dynamic imports, reflection, or scale leave real uncertainty.
+
+When scale or dynamic imports cap the certainty, add an explicit `Not fully checked because …` note to the finding (and to the per-check section when a whole check was capped). State what was skipped and why — e.g. "graph exceeds ~100k lines; only `src/domain` walked," or "package uses `importlib` at runtime; static edges may be incomplete." A finding with no such note asserts a full read; don't imply one you didn't do.
+
 ### 1. Cyclic Dependencies
 
 Scan imports/requires/includes across the codebase. Build the module graph. Report any cycles.
@@ -99,7 +111,19 @@ The brief may list forbidden imports explicitly. Example: "Domain module must no
 
 ## Language-Specific Tools (Opportunistic)
 
-These run only if the user's repo has a config file already present. If not configured, skip silently — the framework-agnostic checks still run.
+These run only if the user's repo has a config file already present. If not configured, don't run the tool — but don't skip it silently either. A check that didn't run must not read as a pass: a silent skip makes the audit look more complete than it was.
+
+For each language detected in step 2, name the tool the audit would normally run and whether it ran. When the tool isn't configured (or its config is present but the binary isn't installed), emit an Info finding recording that the check was expected but not run, so the reader sees exactly what was and wasn't verified:
+
+```markdown
+#### I2. Tool not run: import-linter (Python detected, not configured)
+
+- **Source:** language-specific (import-linter)
+- **Rule:** n/a — coverage note, not a violation.
+- **Detail:** Python detected via `pyproject.toml`, but no `[tool.importlinter]` config is present, so the layer-contract check did not run. Framework-agnostic checks still covered cycles and layer arrows. To enable: configure `import-linter` (see Tool Install Commands).
+```
+
+The framework-agnostic checks still run regardless.
 
 ### TypeScript — dependency-cruiser
 
@@ -242,6 +266,7 @@ Write the report to `.architecture/evaluation-<YYYY-MM-DD>.md`. Use this structu
 #### E1. Cyclic dependency: domain/user ↔ domain/order
 
 - **Source:** framework-agnostic
+- **Confidence:** High — both edges read directly from source; `src/domain` graph walked in full.
 - **Files:** `src/domain/user.py:14`, `src/domain/order.py:7`
 - **Rule:** Brief §7 — "Domain modules must not form cycles."
 - **Fix:** extract shared abstraction into a new module, or break the cycle by inverting one direction.
@@ -259,6 +284,7 @@ Write the report to `.architecture/evaluation-<YYYY-MM-DD>.md`. Use this structu
 #### W1. Public API drift: `OrderService.cancel()` added without ADR
 
 - **Source:** framework-agnostic
+- **Confidence:** Medium — `OrderService` exports read directly, but callers reach methods via a dynamic dispatch table; `Not fully checked because` the dispatch table may register names not visible in the class body.
 - **File:** `src/domain/order.py:142`
 - **Rule:** Brief §8 — "Public API additions require an ADR."
 - **Fix:** run `arch-decide` to record the rationale, or make `cancel()` non-public.
@@ -295,8 +321,9 @@ Write the report to `.architecture/evaluation-<YYYY-MM-DD>.md`. Use this structu
 Before handing off the report:
 
 - [ ] All framework-agnostic checks ran
-- [ ] Detected linters ran if configured; skipped silently if not
+- [ ] Detected languages listed; each expected linter either ran or has an Info "tool not configured / not run" finding (never a silent skip)
 - [ ] Each finding has: severity, source (native or tool name), file/line, rule reference, suggested fix
+- [ ] Each framework-agnostic finding carries a confidence level (High/Medium/Low) and, when scale or dynamic imports capped certainty, a `Not fully checked because …` note
 - [ ] Each finding links to the brief section or ADR that establishes the rule
 - [ ] Raw tool output preserved at the bottom for traceability
 - [ ] Report timestamped and commit-referenced
author	Craig Jennings <c@cjennings.net>	2026-05-22 15:03:30 -0500
committer	Craig Jennings <c@cjennings.net>	2026-05-22 15:03:30 -0500
commit	8f14e8131d691b2026f6c82a13cfd10ccb350892 (patch)
tree	5549c091382f85276c45aa85e0a188a737309265
parent	b43ea88a539b4965e29fdd7e10fb432cac2d3fbd (diff)
download	rulesets-8f14e8131d691b2026f6c82a13cfd10ccb350892.tar.gz rulesets-8f14e8131d691b2026f6c82a13cfd10ccb350892.zip