From aa6924591127970d3241ab6b1a50f4bab457da27 Mon Sep 17 00:00:00 2001
From: Craig Jennings <c@cjennings.net>
Date: Wed, 6 May 2026 06:17:08 -0500
Subject: refactor(skills): convert 16 user-invoked skills to commands

I converted 16 user-invoked skills to commands. Skills cost ~150-300 tokens each per session for descriptions the model uses to auto-route. Commands cost nothing until you type the slash. These 16 are workflows I always trigger deliberately. The auto-routing wasn't earning its keep. This reclaims ~4-5k tokens per session.

Nine skills stayed where auto-routing genuinely helps: debug, root-cause-trace, five-whys, add-tests, frontend-design, humanizer, playwright-js, playwright-py, and pairwise-tests. Pairwise-tests stays a skill because its helper files don't fit a single-file command shape.

For arch-decide, I preserved the upstream MIT LICENSE alongside the command at .claude/commands/arch-decide.LICENSE so attribution stays intact.
---
 .claude/commands/arch-decide.LICENSE       |  25 ++
 .claude/commands/arch-decide.md            | 451 +++++++++++++++++++++++++++++
 .claude/commands/arch-design.md            | 249 ++++++++++++++++
 .claude/commands/arch-document.md          | 317 ++++++++++++++++++++
 .claude/commands/arch-evaluate.md          | 332 +++++++++++++++++++++
 .claude/commands/brainstorm.md             | 195 +++++++++++++
 .claude/commands/c4-analyze.md             | 212 ++++++++++++++
 .claude/commands/c4-diagram.md             | 199 +++++++++++++
 .claude/commands/codify.md                 | 196 +++++++++++++
 .claude/commands/create-v2mom.md           | 426 +++++++++++++++++++++++++++
 .claude/commands/finish-branch.md          | 251 ++++++++++++++++
 .claude/commands/prompt-engineering.md     | 228 +++++++++++++++
 .claude/commands/respond-to-cj-comments.md | 232 +++++++++++++++
 .claude/commands/respond-to-review.md      |  54 ++++
 .claude/commands/review-code.md            | 353 ++++++++++++++++++++++
 .claude/commands/security-check.md         |  48 +++
 .claude/commands/start-work.md             | 269 +++++++++++++++++
 arch-decide/LICENSE                        |  25 --
 arch-decide/SKILL.md                       | 451 -----------------------------
 arch-design/SKILL.md                       | 249 ----------------
 arch-document/SKILL.md                     | 317 --------------------
 arch-evaluate/SKILL.md                     | 332 ---------------------
 brainstorm/SKILL.md                        | 195 -------------
 c4-analyze/SKILL.md                        | 212 --------------
 c4-diagram/SKILL.md                        | 199 -------------
 codify/SKILL.md                            | 196 -------------
 create-v2mom/SKILL.md                      | 426 ---------------------------
 finish-branch/SKILL.md                     | 251 ----------------
 prompt-engineering/SKILL.md                | 228 ---------------
 respond-to-cj-comments/SKILL.md            | 232 ---------------
 respond-to-review/SKILL.md                 |  54 ----
 review-code/SKILL.md                       | 353 ----------------------
 security-check/SKILL.md                    |  48 ---
 start-work/SKILL.md                        | 269 -----------------
 34 files changed, 4037 insertions(+), 4037 deletions(-)
 create mode 100644 .claude/commands/arch-decide.LICENSE
 create mode 100644 .claude/commands/arch-decide.md
 create mode 100644 .claude/commands/arch-design.md
 create mode 100644 .claude/commands/arch-document.md
 create mode 100644 .claude/commands/arch-evaluate.md
 create mode 100644 .claude/commands/brainstorm.md
 create mode 100644 .claude/commands/c4-analyze.md
 create mode 100644 .claude/commands/c4-diagram.md
 create mode 100644 .claude/commands/codify.md
 create mode 100644 .claude/commands/create-v2mom.md
 create mode 100644 .claude/commands/finish-branch.md
 create mode 100644 .claude/commands/prompt-engineering.md
 create mode 100644 .claude/commands/respond-to-cj-comments.md
 create mode 100644 .claude/commands/respond-to-review.md
 create mode 100644 .claude/commands/review-code.md
 create mode 100644 .claude/commands/security-check.md
 create mode 100644 .claude/commands/start-work.md
 delete mode 100644 arch-decide/LICENSE
 delete mode 100644 arch-decide/SKILL.md
 delete mode 100644 arch-design/SKILL.md
 delete mode 100644 arch-document/SKILL.md
 delete mode 100644 arch-evaluate/SKILL.md
 delete mode 100644 brainstorm/SKILL.md
 delete mode 100644 c4-analyze/SKILL.md
 delete mode 100644 c4-diagram/SKILL.md
 delete mode 100644 codify/SKILL.md
 delete mode 100644 create-v2mom/SKILL.md
 delete mode 100644 finish-branch/SKILL.md
 delete mode 100644 prompt-engineering/SKILL.md
 delete mode 100644 respond-to-cj-comments/SKILL.md
 delete mode 100644 respond-to-review/SKILL.md
 delete mode 100644 review-code/SKILL.md
 delete mode 100644 security-check/SKILL.md
 delete mode 100644 start-work/SKILL.md

diff --git a/.claude/commands/arch-decide.LICENSE b/.claude/commands/arch-decide.LICENSE
new file mode 100644
index 0000000..8a8b38e
--- /dev/null
+++ b/.claude/commands/arch-decide.LICENSE
@@ -0,0 +1,25 @@
+MIT License
+
+Copyright (c) 2024 Seth Hobson
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+
+---
+
+Origin: https://github.com/wshobson/agents/blob/main/plugins/documentation-generation/skills/architecture-decision-records/SKILL.md
diff --git a/.claude/commands/arch-decide.md b/.claude/commands/arch-decide.md
new file mode 100644
index 0000000..b0078fb
--- /dev/null
+++ b/.claude/commands/arch-decide.md
@@ -0,0 +1,451 @@
+---
+name: arch-decide
+description: Create, update, and manage Architecture Decision Records (ADRs) that capture significant technical decisions — context, options, chosen approach, consequences. Five template variants (MADR, Nygard, Y-statement, lightweight, RFC). Covers ADR lifecycle (proposed → accepted → deprecated / superseded), review process, and adr-tools automation. Use when documenting an architectural choice, reviewing past decisions, or establishing a decision process. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+---
+
+# Architecture Decision Records
+
+Comprehensive patterns for creating, maintaining, and managing Architecture Decision Records (ADRs) that capture the context and rationale behind significant technical decisions.
+
+## When to Use This Skill
+
+- Making significant architectural decisions
+- Documenting technology choices
+- Recording design trade-offs
+- Onboarding new team members
+- Reviewing historical decisions
+- Establishing decision-making processes
+
+## Core Concepts
+
+### 1. What is an ADR?
+
+An Architecture Decision Record captures:
+
+- **Context**: Why we needed to make a decision
+- **Decision**: What we decided
+- **Consequences**: What happens as a result
+
+### 2. When to Write an ADR
+
+| Write ADR                  | Skip ADR               |
+| -------------------------- | ---------------------- |
+| New framework adoption     | Minor version upgrades |
+| Database technology choice | Bug fixes              |
+| API design patterns        | Implementation details |
+| Security architecture      | Routine maintenance    |
+| Integration patterns       | Configuration changes  |
+
+### 3. ADR Lifecycle
+
+```
+Proposed → Accepted → Deprecated → Superseded
+              ↓
+           Rejected
+```
+
+## Templates
+
+### Template 1: Standard ADR (MADR Format)
+
+```markdown
+# ADR-0001: Use PostgreSQL as Primary Database
+
+## Status
+
+Accepted
+
+## Context
+
+We need to select a primary database for our new e-commerce platform. The system
+will handle:
+
+- ~10,000 concurrent users
+- Complex product catalog with hierarchical categories
+- Transaction processing for orders and payments
+- Full-text search for products
+- Geospatial queries for store locator
+
+The team has experience with MySQL, PostgreSQL, and MongoDB. We need ACID
+compliance for financial transactions.
+
+## Decision Drivers
+
+- **Must have ACID compliance** for payment processing
+- **Must support complex queries** for reporting
+- **Should support full-text search** to reduce infrastructure complexity
+- **Should have good JSON support** for flexible product attributes
+- **Team familiarity** reduces onboarding time
+
+## Considered Options
+
+### Option 1: PostgreSQL
+
+- **Pros**: ACID compliant, excellent JSON support (JSONB), built-in full-text
+  search, PostGIS for geospatial, team has experience
+- **Cons**: Slightly more complex replication setup than MySQL
+
+### Option 2: MySQL
+
+- **Pros**: Very familiar to team, simple replication, large community
+- **Cons**: Weaker JSON support, no built-in full-text search (need
+  Elasticsearch), no geospatial without extensions
+
+### Option 3: MongoDB
+
+- **Pros**: Flexible schema, native JSON, horizontal scaling
+- **Cons**: No ACID for multi-document transactions (at decision time),
+  team has limited experience, requires schema design discipline
+
+## Decision
+
+We will use **PostgreSQL 15** as our primary database.
+
+## Rationale
+
+PostgreSQL provides the best balance of:
+
+1. **ACID compliance** essential for e-commerce transactions
+2. **Built-in capabilities** (full-text search, JSONB, PostGIS) reduce
+   infrastructure complexity
+3. **Team familiarity** with SQL databases reduces learning curve
+4. **Mature ecosystem** with excellent tooling and community support
+
+The slight complexity in replication is outweighed by the reduction in
+additional services (no separate Elasticsearch needed).
+
+## Consequences
+
+### Positive
+
+- Single database handles transactions, search, and geospatial queries
+- Reduced operational complexity (fewer services to manage)
+- Strong consistency guarantees for financial data
+- Team can leverage existing SQL expertise
+
+### Negative
+
+- Need to learn PostgreSQL-specific features (JSONB, full-text search syntax)
+- Vertical scaling limits may require read replicas sooner
+- Some team members need PostgreSQL-specific training
+
+### Risks
+
+- Full-text search may not scale as well as dedicated search engines
+- Mitigation: Design for potential Elasticsearch addition if needed
+
+## Implementation Notes
+
+- Use JSONB for flexible product attributes
+- Implement connection pooling with PgBouncer
+- Set up streaming replication for read replicas
+- Use pg_trgm extension for fuzzy search
+
+## Related Decisions
+
+- ADR-0002: Caching Strategy (Redis) - complements database choice
+- ADR-0005: Search Architecture - may supersede if Elasticsearch needed
+
+## References
+
+- [PostgreSQL JSON Documentation](https://www.postgresql.org/docs/current/datatype-json.html)
+- [PostgreSQL Full Text Search](https://www.postgresql.org/docs/current/textsearch.html)
+- Internal: Performance benchmarks in `/docs/benchmarks/database-comparison.md`
+```
+
+### Template 2: Lightweight ADR
+
+```markdown
+# ADR-0012: Adopt TypeScript for Frontend Development
+
+**Status**: Accepted
+**Date**: 2024-01-15
+**Deciders**: @alice, @bob, @charlie
+
+## Context
+
+Our React codebase has grown to 50+ components with increasing bug reports
+related to prop type mismatches and undefined errors. PropTypes provide
+runtime-only checking.
+
+## Decision
+
+Adopt TypeScript for all new frontend code. Migrate existing code incrementally.
+
+## Consequences
+
+**Good**: Catch type errors at compile time, better IDE support, self-documenting
+code.
+
+**Bad**: Learning curve for team, initial slowdown, build complexity increase.
+
+**Mitigations**: TypeScript training sessions, allow gradual adoption with
+`allowJs: true`.
+```
+
+### Template 3: Y-Statement Format
+
+```markdown
+# ADR-0015: API Gateway Selection
+
+In the context of **building a microservices architecture**,
+facing **the need for centralized API management, authentication, and rate limiting**,
+we decided for **Kong Gateway**
+and against **AWS API Gateway and custom Nginx solution**,
+to achieve **vendor independence, plugin extensibility, and team familiarity with Lua**,
+accepting that **we need to manage Kong infrastructure ourselves**.
+```
+
+### Template 4: ADR for Deprecation
+
+```markdown
+# ADR-0020: Deprecate MongoDB in Favor of PostgreSQL
+
+## Status
+
+Accepted (Supersedes ADR-0003)
+
+## Context
+
+ADR-0003 (2021) chose MongoDB for user profile storage due to schema flexibility
+needs. Since then:
+
+- MongoDB's multi-document transactions remain problematic for our use case
+- Our schema has stabilized and rarely changes
+- We now have PostgreSQL expertise from other services
+- Maintaining two databases increases operational burden
+
+## Decision
+
+Deprecate MongoDB and migrate user profiles to PostgreSQL.
+
+## Migration Plan
+
+1. **Phase 1** (Week 1-2): Create PostgreSQL schema, dual-write enabled
+2. **Phase 2** (Week 3-4): Backfill historical data, validate consistency
+3. **Phase 3** (Week 5): Switch reads to PostgreSQL, monitor
+4. **Phase 4** (Week 6): Remove MongoDB writes, decommission
+
+## Consequences
+
+### Positive
+
+- Single database technology reduces operational complexity
+- ACID transactions for user data
+- Team can focus PostgreSQL expertise
+
+### Negative
+
+- Migration effort (~4 weeks)
+- Risk of data issues during migration
+- Lose some schema flexibility
+
+## Lessons Learned
+
+Document from ADR-0003 experience:
+
+- Schema flexibility benefits were overestimated
+- Operational cost of multiple databases was underestimated
+- Consider long-term maintenance in technology decisions
+```
+
+### Template 5: Request for Comments (RFC) Style
+
+```markdown
+# RFC-0025: Adopt Event Sourcing for Order Management
+
+## Summary
+
+Propose adopting event sourcing pattern for the order management domain to
+improve auditability, enable temporal queries, and support business analytics.
+
+## Motivation
+
+Current challenges:
+
+1. Audit requirements need complete order history
+2. "What was the order state at time X?" queries are impossible
+3. Analytics team needs event stream for real-time dashboards
+4. Order state reconstruction for customer support is manual
+
+## Detailed Design
+
+### Event Store
+
+```
+OrderCreated { orderId, customerId, items[], timestamp }
+OrderItemAdded { orderId, item, timestamp }
+OrderItemRemoved { orderId, itemId, timestamp }
+PaymentReceived { orderId, amount, paymentId, timestamp }
+OrderShipped { orderId, trackingNumber, timestamp }
+```
+
+### Projections
+
+- **CurrentOrderState**: Materialized view for queries
+- **OrderHistory**: Complete timeline for audit
+- **DailyOrderMetrics**: Analytics aggregation
+
+### Technology
+
+- Event Store: EventStoreDB (purpose-built, handles projections)
+- Alternative considered: Kafka + custom projection service
+
+## Drawbacks
+
+- Learning curve for team
+- Increased complexity vs. CRUD
+- Need to design events carefully (immutable once stored)
+- Storage growth (events never deleted)
+
+## Alternatives
+
+1. **Audit tables**: Simpler but doesn't enable temporal queries
+2. **CDC from existing DB**: Complex, doesn't change data model
+3. **Hybrid**: Event source only for order state changes
+
+## Unresolved Questions
+
+- [ ] Event schema versioning strategy
+- [ ] Retention policy for events
+- [ ] Snapshot frequency for performance
+
+## Implementation Plan
+
+1. Prototype with single order type (2 weeks)
+2. Team training on event sourcing (1 week)
+3. Full implementation and migration (4 weeks)
+4. Monitoring and optimization (ongoing)
+
+## References
+
+- [Event Sourcing by Martin Fowler](https://martinfowler.com/eaaDev/EventSourcing.html)
+- [EventStoreDB Documentation](https://www.eventstore.com/docs)
+```
+
+## ADR Management
+
+### Directory Structure
+
+```
+docs/
+├── adr/
+│   ├── README.md           # Index and guidelines
+│   ├── template.md         # Team's ADR template
+│   ├── 0001-use-postgresql.md
+│   ├── 0002-caching-strategy.md
+│   ├── 0003-mongodb-user-profiles.md  # [DEPRECATED]
+│   └── 0020-deprecate-mongodb.md      # Supersedes 0003
+```
+
+### ADR Index (README.md)
+
+```markdown
+# Architecture Decision Records
+
+This directory contains Architecture Decision Records (ADRs) for [Project Name].
+
+## Index
+
+| ADR                                   | Title                              | Status     | Date       |
+| ------------------------------------- | ---------------------------------- | ---------- | ---------- |
+| [0001](0001-use-postgresql.md)        | Use PostgreSQL as Primary Database | Accepted   | 2024-01-10 |
+| [0002](0002-caching-strategy.md)      | Caching Strategy with Redis        | Accepted   | 2024-01-12 |
+| [0003](0003-mongodb-user-profiles.md) | MongoDB for User Profiles          | Deprecated | 2023-06-15 |
+| [0020](0020-deprecate-mongodb.md)     | Deprecate MongoDB                  | Accepted   | 2024-01-15 |
+
+## Creating a New ADR
+
+1. Copy `template.md` to `NNNN-title-with-dashes.md`
+2. Fill in the template
+3. Submit PR for review
+4. Update this index after approval
+
+## ADR Status
+
+- **Proposed**: Under discussion
+- **Accepted**: Decision made, implementing
+- **Deprecated**: No longer relevant
+- **Superseded**: Replaced by another ADR
+- **Rejected**: Considered but not adopted
+```
+
+### Automation (adr-tools)
+
+```bash
+# Install adr-tools
+brew install adr-tools
+
+# Initialize ADR directory
+adr init docs/adr
+
+# Create new ADR
+adr new "Use PostgreSQL as Primary Database"
+
+# Supersede an ADR
+adr new -s 3 "Deprecate MongoDB in Favor of PostgreSQL"
+
+# Generate table of contents
+adr generate toc > docs/adr/README.md
+
+# Link related ADRs
+adr link 2 "Complements" 1 "Is complemented by"
+```
+
+## Review Process
+
+```markdown
+## ADR Review Checklist
+
+### Before Submission
+
+- [ ] Context clearly explains the problem
+- [ ] All viable options considered
+- [ ] Pros/cons balanced and honest
+- [ ] Consequences (positive and negative) documented
+- [ ] Related ADRs linked
+
+### During Review
+
+- [ ] At least 2 senior engineers reviewed
+- [ ] Affected teams consulted
+- [ ] Security implications considered
+- [ ] Cost implications documented
+- [ ] Reversibility assessed
+
+### After Acceptance
+
+- [ ] ADR index updated
+- [ ] Team notified
+- [ ] Implementation tickets created
+- [ ] Related documentation updated
+```
+
+## Best Practices
+
+### Do's
+
+- **Write ADRs early** - Before implementation starts
+- **Keep them short** - 1-2 pages maximum
+- **Be honest about trade-offs** - Include real cons
+- **Link related decisions** - Build decision graph
+- **Update status** - Deprecate when superseded
+
+### Don'ts
+
+- **Don't change accepted ADRs** - Write new ones to supersede
+- **Don't skip context** - Future readers need background
+- **Don't hide failures** - Rejected decisions are valuable
+- **Don't be vague** - Specific decisions, specific consequences
+- **Don't forget implementation** - ADR without action is waste
+
+---
+
+## Attribution
+
+Forked from [wshobson/agents](https://github.com/wshobson/agents) — MIT licensed.
+See `LICENSE` in this directory for the original copyright and terms.
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/arch-design.md b/.claude/commands/arch-design.md
new file mode 100644
index 0000000..744b8d0
--- /dev/null
+++ b/.claude/commands/arch-design.md
@@ -0,0 +1,249 @@
+---
+name: arch-design
+description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+---
+
+# Architecture Design
+
+Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision.
+
+## When to Use This Skill
+
+- Starting a new project, service, or major subsystem
+- Restructuring or re-platforming an existing system
+- Selecting a primary tech stack for a green-field effort
+- Establishing a formal architecture before a team scales
+- Preparing a spike or prototype you want to keep coherent
+
+## When NOT to Use This Skill
+
+- Bug fixing or defect investigation
+- Small feature additions inside an existing architecture
+- Recording a single decision (use `arch-decide`)
+- Producing formal documentation from a known architecture (use `arch-document`)
+- Auditing an existing codebase against its stated architecture (use `arch-evaluate`)
+
+## Workflow
+
+The skill runs in five phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake.
+
+1. **Intake** — elicit stakeholders, domain, scale, timeline, team
+2. **Quality attributes** — prioritize (can't have everything)
+3. **Constraints** — technical, organizational, legal, cost
+4. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis
+5. **Recommendation** — pick one, justify it, flag open questions for ADRs
+
+## Phase 1 — Intake
+
+Ask the user to answer each. Short answers are fine; vague answers mean return to the question.
+
+**System identity**
+- What is this system? (One sentence.)
+- Who uses it? Human end users, other services, both?
+- What domain is it in? (commerce, health, comms, finance, ops, etc.)
+
+**Scale**
+- Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes)
+- Data volume at launch and in 12 months? (rows/docs, GB)
+- Geographic distribution? (single region, multi-region, edge)
+
+**Team**
+- Team size now? Growing?
+- Existing language/stack expertise?
+- Operational maturity? (on-call rotation, observability tooling, CI/CD)
+
+**Timeline**
+- When does it need to be in production?
+- Is this replacing something? What's the migration path?
+
+**Change drivers**
+- What forces are likely to reshape this system? (new markets, regulatory, volume, organizational)
+
+## Phase 2 — Quality Attributes
+
+List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities.
+
+Standard list (use this set; add domain-specific ones if relevant):
+
+- **Performance** (latency, throughput under load)
+- **Scalability** (horizontal, vertical, elastic)
+- **Availability** (uptime target, failure tolerance)
+- **Reliability** (data durability, correctness under partial failure)
+- **Security** (authn/z, data protection, threat model)
+- **Maintainability** (readability, testability, onboarding speed)
+- **Observability** (logs, metrics, tracing, debuggability)
+- **Deployability** (release cadence, rollback speed)
+- **Cost** (infra, operational, total cost of ownership)
+- **Compliance** (regulatory, audit, data residency)
+- **Interoperability** (integration with existing systems, APIs, standards)
+- **Flexibility / evolvability** (ease of adding features, changing direction)
+
+Document the ranking verbatim in the brief. Future decisions hinge on it.
+
+## Phase 3 — Constraints
+
+Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide.
+
+- **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move
+- **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan
+- **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention
+- **Cost**: budget ceiling, licensing limits, infra cost targets
+- **Timeline**: hard dates from business, regulatory deadlines
+- **Compatibility**: backward-compatibility with existing clients, API contracts, data formats
+
+## Phase 4 — Candidate Paradigms
+
+Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed.
+
+Common paradigms to consider:
+
+| Paradigm | Fits when… | Doesn't fit when… |
+|---|---|---|
+| **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements |
+| **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity |
+| **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate |
+| **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit |
+| **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity |
+| **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified |
+| **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive |
+| **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control |
+| **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency |
+| **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale |
+| **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system |
+
+For each candidate, document:
+
+- **Summary** — one paragraph on what the architecture would look like
+- **Why it fits** — map to the ranked quality attributes
+- **Why it might not** — the honest cons for this specific context
+- **Cost** — team learning curve, operational overhead, infra impact
+- **Open questions** — what would need to be answered or decided via ADR
+
+## Phase 5 — Recommendation
+
+Choose one paradigm. Justify it by:
+
+- Which top-3 quality attributes the choice serves
+- Which constraints it respects
+- Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y)
+- What you're trading away (be explicit)
+
+Flag items that need their own ADRs — use `arch-decide` to record them. Examples:
+
+- Primary database choice
+- Message bus vs. direct calls
+- Sync vs. async inter-service comms
+- Multi-tenancy approach
+- Authentication/authorization boundary
+
+## Output: `.architecture/brief.md`
+
+Write the final brief to `.architecture/brief.md` in the project root. Use this structure:
+
+```markdown
+# Architecture Brief — <Project Name>
+
+**Date:** <YYYY-MM-DD>
+**Authors:** <names>
+**Status:** Draft | Accepted | Revised
+
+## 1. System
+
+<One-paragraph description of what the system does.>
+
+## 2. Stakeholders and Users
+
+- **Primary users:** …
+- **Secondary users:** …
+- **Operators:** …
+- **Integrators / dependent systems:** …
+
+## 3. Scale Targets
+
+| Metric | Launch | +12 months |
+|---|---|---|
+| RPS | … | … |
+| Data volume | … | … |
+| Geographic spread | … | … |
+
+## 4. Quality Attributes (Prioritized)
+
+| Rank | Attribute | Notes |
+|---|---|---|
+| 1 | … | critical driver |
+| 2 | … | … |
+
+## 5. Constraints
+
+- **Technical:** …
+- **Organizational:** …
+- **Legal/Regulatory:** …
+- **Cost:** …
+- **Timeline:** …
+
+## 6. Candidates Considered
+
+### Candidate A — <paradigm>
+
+Summary. Fit. Cons. Cost. Open questions.
+
+### Candidate B — <paradigm>
+
+…
+
+## 7. Recommendation
+
+**Chosen paradigm:** …
+
+**Rationale:**
+- …
+
+**Trade-offs accepted:**
+- …
+
+**Rejected alternatives and why:**
+- …
+
+## 8. Open Decisions (Candidates for ADRs)
+
+- [ ] Primary database — driver: …
+- [ ] …
+
+## 9. Next Steps
+
+- Run `arch-decide` for each open decision above
+- Run `arch-document` to produce the full arc42 document
+- Run `arch-evaluate` once implementation begins to audit conformance
+```
+
+## Review Checklist
+
+Before marking the brief Accepted:
+
+- [ ] Every quality attribute is ranked; no "all critical"
+- [ ] Every constraint is explicit; no "we probably can't"
+- [ ] At least 2 candidates considered; each has real cons
+- [ ] Recommendation names what it's trading away
+- [ ] Open decisions listed; each will become an ADR via `arch-decide`
+- [ ] Stakeholders have seen and (ideally) approved
+
+## Anti-Patterns
+
+- **"All quality attributes are critical."** They aren't. Force ranking.
+- **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting.
+- **Single candidate.** One option means the user already decided; you're documenting, not designing.
+- **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so.
+- **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard.
+- **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation.
+
+## Hand-Off
+
+After the brief is Accepted:
+
+- `arch-decide` — record each Open Decision as an ADR under `docs/adr/`
+- `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams
+- `arch-evaluate` — once code exists, audit it against this brief
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/arch-document.md b/.claude/commands/arch-document.md
new file mode 100644
index 0000000..24a3720
--- /dev/null
+++ b/.claude/commands/arch-document.md
@@ -0,0 +1,317 @@
+---
+name: arch-document
+description: Produce a complete arc42-structured architecture document from a project's architecture brief and ADRs. Generates all twelve arc42 sections (Introduction & Goals, Constraints, Context & Scope, Solution Strategy, Building Block View, Runtime View, Deployment View, Crosscutting Concepts, Architecture Decisions, Quality Requirements, Risks & Technical Debt, Glossary). Dispatches to the c4-analyze and c4-diagram skills for building-block, container, and context diagrams. Outputs one file per section under `docs/architecture/`. Use when formalizing an architecture that already has a brief + ADRs, preparing documentation for a review, onboarding new engineers, or satisfying a compliance requirement. Do NOT use for shaping a new architecture (use arch-design), recording individual decisions (use arch-decide), auditing code against an architecture (use arch-evaluate), or for simple systems where a brief alone suffices. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+---
+
+# Architecture Documentation (arc42)
+
+Turn an architecture brief and a set of ADRs into a full arc42-structured document. The result is the authoritative reference for the system — engineers, auditors, and new hires read this to understand what exists and why.
+
+## When to Use This Skill
+
+- A project has completed `arch-design` and has a `.architecture/brief.md`
+- Open decisions from the brief have been answered via `arch-decide` (ADRs)
+- The team needs documentation suitable for review, onboarding, or compliance
+- An existing system needs retroactive architecture docs
+
+## When NOT to Use This Skill
+
+- Before a brief exists (run `arch-design` first)
+- For small systems where a brief alone is sufficient (don't over-document)
+- To record a single decision (use `arch-decide`)
+- To audit code (use `arch-evaluate`)
+- When the team doesn't need arc42 — a lighter structure may serve better
+
+## Inputs
+
+Expected files, in order of preference:
+
+1. `.architecture/brief.md` — output of `arch-design`
+2. `docs/adr/*.md` — ADRs from `arch-decide`
+3. Codebase (for inferring module structure, dispatched to `c4-analyze`)
+4. Existing `docs/architecture/` contents (if updating rather than creating)
+
+If `.architecture/brief.md` is absent, stop and tell the user to run `arch-design` first. Do not fabricate the brief.
+
+## Workflow
+
+1. **Load and validate inputs** — brief present? ADRs found? Any existing arc42 docs?
+2. **Plan section coverage** — which of the twelve sections need content from the brief, ADRs, code, or user?
+3. **Generate each section** — draft from available inputs; explicitly mark gaps
+4. **Dispatch to C4 skills** — for sections needing diagrams (Context & Scope, Building Block View, Runtime View, Deployment View)
+5. **Cross-link** — ensure every ADR is referenced; every diagram has a narrative
+6. **Output** — one file per section under `docs/architecture/`, plus an index
+
+## The Twelve arc42 Sections
+
+For each: what it's for, what goes in it, where the content comes from, and what to dispatch.
+
+### 1. Introduction and Goals → `01-introduction.md`
+
+**Purpose:** Why does this system exist? What are the top business goals?
+
+**Content:**
+- One-paragraph system description (from brief §1)
+- Top 3-5 business goals (from brief §1-2)
+- Top 3 stakeholders and their concerns (from brief §2)
+- Top 3 quality goals (from brief §4, pulling the top-ranked attributes)
+
+**Sources:** brief §1, §2, §4.
+
+### 2. Architecture Constraints → `02-constraints.md`
+
+**Purpose:** What's non-negotiable? What narrows the solution space?
+
+**Content:**
+- Technical constraints (stack mandates, integration points)
+- Organizational constraints (team, expertise, timeline)
+- Conventions (style guides, naming, review process)
+- Legal/regulatory/compliance
+
+**Sources:** brief §5. Expand with any inherited constraints not in the brief.
+
+### 3. Context and Scope → `03-context.md`
+
+**Purpose:** What's inside the system? What's outside? What are the boundaries?
+
+**Content:**
+- **Business context:** external users, upstream/downstream systems, human actors. One diagram.
+- **Technical context:** protocols, data formats, deployment boundaries. One diagram or table.
+
+**Dispatch:** call the `c4-diagram` skill with a C4 System Context diagram request. Pass the brief's §1-2 (system identity + stakeholders) + integration points from §5 constraints.
+
+### 4. Solution Strategy → `04-solution-strategy.md`
+
+**Purpose:** The fundamental decisions that shape everything else.
+
+**Content:**
+- Primary architectural paradigm (from brief §7, the recommendation)
+- Top-level technology decisions (link to ADRs)
+- Decomposition strategy (how the system breaks into modules/services)
+- Approach to the top 3 quality goals (one sentence each: how does the strategy achieve performance / availability / security / whatever ranked highest)
+
+**Sources:** brief §7, ADRs.
+
+### 5. Building Block View → `05-building-blocks.md`
+
+**Purpose:** Static decomposition. Layer-by-layer breakdown.
+
+**Content:**
+- **Level 1** (whiteboard view): the handful of major components
+- **Level 2** (for significant components): internal structure
+- For each component: responsibility, key interfaces, dependencies, quality properties
+
+**Dispatch:** call `c4-analyze` on the codebase if code exists; it produces Container and Component diagrams. Embed those diagrams + narrative here. If no code exists, call `c4-diagram` with the decomposition from brief §7.
+
+### 6. Runtime View → `06-runtime.md`
+
+**Purpose:** Dynamic behavior. The interesting interactions.
+
+**Content:** 3-5 scenarios that matter (not every flow — the ones that illustrate key architecture). Each scenario:
+- What triggers it
+- Which components participate
+- Data flow
+- Error handling / failure modes
+
+**Dispatch:** sequence diagrams via `c4-diagram`. One per scenario.
+
+### 7. Deployment View → `07-deployment.md`
+
+**Purpose:** Where does the system run? How is it packaged?
+
+**Content:**
+- Deployment environments (dev, staging, prod; regions; edge)
+- Infrastructure topology (VMs, containers, serverless, managed services)
+- Allocation of building blocks to infrastructure
+- Network boundaries, data flow across them
+
+**Dispatch:** deployment diagram via `c4-diagram`.
+
+### 8. Crosscutting Concepts → `08-crosscutting.md`
+
+**Purpose:** Concerns that span the system — not owned by one module.
+
+**Content:** one subsection per concept. Typical concepts:
+- Security (authn/z, data protection, secrets management)
+- Error handling (retries, circuit breakers, dead-letter queues)
+- Logging, metrics, tracing
+- Configuration and feature flags
+- Persistence patterns (ORM? repository? direct SQL?)
+- Concurrency model
+- Caching strategy
+- Internationalization
+- Testability approach
+
+Only include concepts that are actually crosscutting in *this* system. If error handling is owned by one service, it's not crosscutting.
+
+### 9. Architecture Decisions → `09-decisions.md`
+
+**Purpose:** Index of the significant decisions. Not the ADRs themselves — a curated summary.
+
+**Content:** a table linking out to every ADR in `docs/adr/`:
+
+| ADR | Decision | Status | Date |
+|---|---|---|---|
+| [0001](../adr/0001-database.md) | Primary database: PostgreSQL | Accepted | 2024-01-10 |
+
+Plus a one-sentence summary of *why this set of decisions, not others*. The meta-rationale.
+
+**Sources:** `docs/adr/*.md`. Auto-generate the table from the filesystem; update on each run.
+
+### 10. Quality Requirements → `10-quality.md`
+
+**Purpose:** Measurable quality targets. Not "the system should be fast" — specific scenarios.
+
+**Content:**
+- **Quality tree** — the ranked quality attributes from brief §4, each with refinements
+- **Quality scenarios** — specific, testable. Format: "Under [condition], the system should [response] within [measure]."
+
+Example:
+
+> Under peak traffic (10,000 concurrent users), a cart checkout should complete in under 2 seconds at the 95th percentile.
+
+Minimum 1 scenario per top-3 quality attribute.
+
+**Sources:** brief §4, §7 (rationale).
+
+### 11. Risks and Technical Debt → `11-risks.md`
+
+**Purpose:** Known risks and liabilities. A snapshot of what could go wrong.
+
+**Content:**
+
+| Risk / Debt | Impact | Likelihood | Mitigation / Plan |
+|---|---|---|---|
+| Single DB becomes bottleneck at 50k RPS | High | Medium (12mo) | Add read replicas; monitor query latency |
+
+Plus a short narrative on known technical debt, what caused it, and when it'll be addressed.
+
+**Sources:** brief §7 (trade-offs accepted), team knowledge. Refresh regularly.
+
+### 12. Glossary → `12-glossary.md`
+
+**Purpose:** Shared vocabulary. Terms, acronyms, and domain-specific words with agreed definitions.
+
+**Content:** table, alphabetical.
+
+| Term | Definition |
+|---|---|
+| Cart | A customer's in-progress selection of items before checkout. |
+| Order | A confirmed, paid transaction resulting from a checked-out cart. |
+
+Keep disciplined: when two people use the same word differently, the glossary is where the disagreement surfaces.
+
+## Output Layout
+
+```
+docs/architecture/
+├── README.md                  # Index; generated from the 12 sections
+├── 01-introduction.md
+├── 02-constraints.md
+├── 03-context.md
+├── 04-solution-strategy.md
+├── 05-building-blocks.md
+├── 06-runtime.md
+├── 07-deployment.md
+├── 08-crosscutting.md
+├── 09-decisions.md
+├── 10-quality.md
+├── 11-risks.md
+├── 12-glossary.md
+└── diagrams/                  # C4 outputs land here
+    ├── context.svg
+    ├── container.svg
+    ├── component-<module>.svg
+    └── runtime-<scenario>.svg
+```
+
+### README.md (auto-generated index)
+
+```markdown
+# Architecture Documentation — <Project Name>
+
+arc42-structured documentation. Read top to bottom for context; skip to a
+specific section via the index.
+
+## Sections
+
+1. [Introduction and Goals](01-introduction.md)
+2. [Architecture Constraints](02-constraints.md)
+3. [Context and Scope](03-context.md)
+4. [Solution Strategy](04-solution-strategy.md)
+5. [Building Block View](05-building-blocks.md)
+6. [Runtime View](06-runtime.md)
+7. [Deployment View](07-deployment.md)
+8. [Crosscutting Concepts](08-crosscutting.md)
+9. [Architecture Decisions](09-decisions.md)
+10. [Quality Requirements](10-quality.md)
+11. [Risks and Technical Debt](11-risks.md)
+12. [Glossary](12-glossary.md)
+
+## Source Documents
+
+- Brief: [`.architecture/brief.md`](../../.architecture/brief.md)
+- Decisions: [`../adr/`](../adr/)
+
+## Last Updated
+
+<YYYY-MM-DD> — regenerate by running `arch-document` after brief or ADR changes.
+```
+
+## Dispatch to C4 Skills
+
+Several sections need diagrams. The `arch-document` skill does not generate diagrams directly; it dispatches:
+
+- **Section 3 (Context and Scope)** → `c4-diagram` for a System Context diagram. Input: system name, external actors, external systems.
+- **Section 5 (Building Block View)** → `c4-analyze` if the codebase exists (it infers Container + Component diagrams). Otherwise `c4-diagram` with decomposition from brief.
+- **Section 6 (Runtime View)** → `c4-diagram` for sequence diagrams, one per scenario.
+- **Section 7 (Deployment View)** → `c4-diagram` for a deployment diagram.
+
+When dispatching, include: the relevant section narrative + the specific diagram type + any identifiers needed. Capture outputs in `docs/architecture/diagrams/` and embed via relative markdown image links.
+
+## Gap Handling
+
+If information is missing for a section:
+
+- **Brief gap:** stop and ask the user, or mark the section `TODO: resolve with arch-design`
+- **ADR gap:** stub the section with a `TODO: run arch-decide for <decision>` and proceed
+- **Code gap:** if dispatching to c4-analyze fails (no code yet), fall back to c4-diagram with brief-derived decomposition
+
+Never fabricate content. A `TODO` is honest; a plausible-sounding but made-up detail is misleading.
+
+## Review Checklist
+
+Before marking the documentation complete:
+
+- [ ] Every section has either real content or an explicit TODO
+- [ ] Every ADR is referenced in section 9
+- [ ] Diagrams are present for sections 3, 5, 6, 7
+- [ ] Quality scenarios (§10) are specific and measurable
+- [ ] Risks (§11) include likelihood and mitigation
+- [ ] Glossary (§12) captures domain terms, not just jargon
+- [ ] README index lists all sections with correct relative links
+
+## Anti-Patterns
+
+- **Template filling without thought.** Copying section headings but making up the content. If you don't have the data, mark TODO.
+- **Diagrams without narrative.** A diagram in isolation tells half the story. Each diagram needs a paragraph saying what to notice.
+- **Every concept as crosscutting.** If security is owned by one service, it's not crosscutting — put it in that service's building-block entry.
+- **Uncurated ADR dump.** Section 9 should be a curated index with the *reason* these decisions hang together, not just a directory listing.
+- **Vague quality requirements.** "The system should be fast" is useless. Specify scenarios with measurable bounds.
+- **Risk-free architecture.** Every real system has risks. A blank risks section means you didn't look.
+
+## Maintenance
+
+arc42 is a living document. Re-run `arch-document` when:
+
+- A new ADR is added → section 9 refreshes
+- The brief is revised → sections 1, 4, 10 may change
+- The codebase restructures → section 5 (via re-running c4-analyze)
+- A new deployment target is added → section 7
+- A scenario's behavior or SLO changes → section 6 or 10
+
+Commit each regeneration. The document's Git history is part of the architecture record.
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/arch-evaluate.md b/.claude/commands/arch-evaluate.md
new file mode 100644
index 0000000..f1c3391
--- /dev/null
+++ b/.claude/commands/arch-evaluate.md
@@ -0,0 +1,332 @@
+---
+name: arch-evaluate
+description: Audit an existing codebase against its stated architecture brief and ADRs. Runs framework-agnostic checks (cyclic dependencies, stated-layer violations, public API drift) that work on any language without setup, and opportunistically invokes language-specific linters (dependency-cruiser for TypeScript, import-linter for Python, go vet + depguard for Go) when they're already configured in the repo — augmenting findings, never replacing. Produces a report with severity levels (error / warning / info) and pointers to the relevant brief section or ADR for each violation. Use when auditing conformance before a release, during code review, when an architecture is suspected to have drifted, or as a pre-merge CI gate. Do NOT use for designing an architecture (use arch-design), recording decisions (use arch-decide), or producing documentation (use arch-document). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+---
+
+# Architecture Evaluation
+
+Audit an implementation against its stated architecture. Framework-agnostic by default; augmented by language-specific linters when they're configured in the repo. Reports violations with severity and the rule they break.
+
+## When to Use This Skill
+
+- Auditing a codebase before a major release
+- During code review of a structurally significant change
+- When architecture is suspected to have drifted from its documented intent
+- As a pre-merge check (output can feed CI)
+- Before an architecture review meeting
+
+## When NOT to Use This Skill
+
+- No brief or ADRs exist (run `arch-design` and `arch-decide` first)
+- Shaping a new architecture (use `arch-design`)
+- Recording a single decision (use `arch-decide`)
+- Generating documentation (use `arch-document`)
+- Deep code-quality review (this skill checks *structural* conformance, not line-level quality — use a code review skill for that)
+
+## Inputs
+
+1. `.architecture/brief.md` — the source of truth for paradigm, layers, quality attributes
+2. `docs/adr/*.md` — specific decisions that the code must honor
+3. `docs/architecture/*.md` — if present, used for additional structural context
+4. The codebase itself
+
+If the brief is missing, stop and tell the user to run `arch-design` first. Do not guess at intent.
+
+## Workflow
+
+1. **Load the brief and ADRs.** Extract: declared paradigm, layers (if any), forbidden dependencies, module boundaries, API contracts that matter.
+2. **Detect repo languages and linters.** Inspect `package.json`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.
+3. **Run framework-agnostic checks.** Always. These never need tooling.
+4. **Run language-specific tools if configured.** Opportunistic — only if the repo already has `.dependency-cruiser.cjs`, `.importlinter`, `.golangci.yml` with import rules, etc. Never install tooling.
+5. **Combine findings.** Deduplicate across sources. Label each finding with provenance (native / tool).
+6. **Produce report.** Severity-sorted markdown at `.architecture/evaluation-<date>.md`.
+
+## Framework-Agnostic Checks
+
+These work on any language. Claude reads the code and applies the policy from the brief.
+
+### 1. Cyclic Dependencies
+
+Scan imports/requires/includes across the codebase. Build the module graph. Report any cycles.
+
+**Severity:** Error (cycles are almost always architecture bugs).
+
+**Scale limit:** for codebases over ~100k lines, this check is noisy — prefer the language-specific tool output (much faster and complete).
+
+### 2. Stated-Layer Violations
+
+If the brief declares layers (e.g., `presentation → application → domain → infrastructure`), scan imports for arrows that go the wrong way.
+
+**Policy format in the brief:**
+
+```
+Layers (outer → inner):
+  presentation → application → domain
+  presentation → application → infrastructure
+  application → infrastructure (repositories only)
+  domain → (none)
+```
+
+**Check:** each import statement's target must be reachable from the source via the declared arrows. Flag any that isn't.
+
+**Severity:** Error when the violation crosses a top-level layer. Warning when it's a within-layer oddity.
+
+### 3. Public API Drift
+
+If the brief or an ADR documents the public interface of a module/package (exported types, functions, endpoints), compare the declared interface to what the code actually exports.
+
+**Sources for expected interface:**
+
+- ADRs with code-block signatures
+- Brief §7 or Candidate description
+- Any `docs/architecture/05-building-blocks.md` section
+
+**Check:** listed public names exist; no additional exports are marked public unless recorded.
+
+**Severity:** Warning (intended additions may just lack an ADR yet).
+
+### 4. Open Decision vs Implementation
+
+For each item in brief §8 (Open Decisions): has it been decided via an ADR? Is the implementation consistent with that ADR?
+
+**Severity:** Warning (drift here usually means someone made a decision without recording it).
+
+### 5. Forbidden Dependencies
+
+The brief may list forbidden imports explicitly. Example: "Domain module must not import from framework packages (Django, FastAPI, etc.)." Check.
+
+**Severity:** Error.
+
+## Language-Specific Tools (Opportunistic)
+
+These run only if the user's repo has a config file already present. If not configured, skip silently — the framework-agnostic checks still run.
+
+### TypeScript — dependency-cruiser
+
+**Detected when:** `.dependency-cruiser.cjs`, `.dependency-cruiser.js`, or `dependency-cruiser` config in `package.json`.
+
+**Invocation:**
+
+```bash
+npx dependency-cruiser --validate .dependency-cruiser.cjs src/
+```
+
+**Parse:** JSON output (`--output-type json`). Each violation becomes a finding with severity mapped from the rule's `severity`.
+
+### Python — import-linter
+
+**Detected when:** `.importlinter`, `importlinter.ini`, or `[tool.importlinter]` in `pyproject.toml`.
+
+**Invocation:**
+
+```bash
+lint-imports
+```
+
+**Parse:** exit code + text output. Contract failures are errors; contract warnings are warnings.
+
+### Python — grimp (supplementary)
+
+If `import-linter` isn't configured but Python code is present, a lightweight check using `grimp` can still detect cycles:
+
+```bash
+python -c "import grimp; g = grimp.build_graph('your_package'); print(g.find_shortest_chains('a.module', 'b.module'))"
+```
+
+Skip unless the user enables it.
+
+### Go — go vet + depguard
+
+**Detected when:** `.golangci.yml` or `.golangci.yaml` contains a `depguard` linter entry.
+
+**Invocation:**
+
+```bash
+go vet ./...
+golangci-lint run --disable-all --enable=depguard ./...
+```
+
+**Parse:** standard Go-style output. Depguard violations mapped to layer rules in the brief.
+
+### Future Languages
+
+For Java, C, C++ — tooling exists (ArchUnit, include-what-you-use, cpp-linter) but isn't integrated in v1. Framework-agnostic checks still apply.
+
+## Tool Install Commands (for reference)
+
+Included so the skill can tell the user what to install if they want the tool-augmented checks. The skill never installs; the user does.
+
+### Python
+
+```bash
+pipx install import-linter     # or: pip install --user import-linter
+pipx install grimp             # optional, supplementary
+```
+
+### TypeScript / JavaScript
+
+```bash
+npm install --save-dev dependency-cruiser
+```
+
+### Go
+
+```bash
+# golangci-lint includes depguard
+brew install golangci-lint     # macOS
+# or
+go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
+```
+
+### Java (future v2+)
+
+```xml
+<!-- pom.xml -->
+<dependency>
+  <groupId>com.tngtech.archunit</groupId>
+  <artifactId>archunit-junit5</artifactId>
+  <version>1.3.0</version>
+  <scope>test</scope>
+</dependency>
+```
+
+### C / C++ (future v2+)
+
+```bash
+brew install include-what-you-use
+# or on Linux: package manager, or build from source
+```
+
+## Config Generation
+
+The skill does not auto-generate linter configs. That's v2+. For now, if the user wants tool-augmented checks, they configure the linter themselves, using the brief as a guide. Document the desired rules in the brief so humans (and future Claude sessions) can translate consistently.
+
+Example — mapping brief layers to `import-linter` contracts:
+
+```ini
+# .importlinter
+[importlinter]
+root_package = myapp
+
+[importlinter:contract:layers]
+name = Core layers
+type = layers
+layers =
+    myapp.presentation
+    myapp.application
+    myapp.domain
+    myapp.infrastructure
+```
+
+## Output: `.architecture/evaluation-<date>.md`
+
+Write the report to `.architecture/evaluation-<YYYY-MM-DD>.md`. Use this structure:
+
+```markdown
+# Architecture Evaluation — <Project Name>
+
+**Date:** <YYYY-MM-DD>
+**Brief version:** <commit hash or date of `.architecture/brief.md`>
+**Checks run:** framework-agnostic + <detected tools>
+
+## Summary
+
+- **Errors:** <N>
+- **Warnings:** <N>
+- **Info:** <N>
+
+## Findings
+
+### Errors
+
+#### E1. Cyclic dependency: domain/user ↔ domain/order
+
+- **Source:** framework-agnostic
+- **Files:** `src/domain/user.py:14`, `src/domain/order.py:7`
+- **Rule:** Brief §7 — "Domain modules must not form cycles."
+- **Fix:** extract shared abstraction into a new module, or break the cycle by inverting one direction.
+
+#### E2. Forbidden dependency: domain imports Django
+
+- **Source:** import-linter (contract: framework_isolation)
+- **Files:** `myapp/domain/user.py:3`
+- **Rule:** Brief §5 — "Domain layer must not depend on framework packages."
+- **Related ADR:** [ADR-0004 Framework isolation](../docs/adr/0004-framework-isolation.md)
+- **Fix:** move the Django-specific logic to `myapp/infrastructure`.
+
+### Warnings
+
+#### W1. Public API drift: `OrderService.cancel()` added without ADR
+
+- **Source:** framework-agnostic
+- **File:** `src/domain/order.py:142`
+- **Rule:** Brief §8 — "Public API additions require an ADR."
+- **Fix:** run `arch-decide` to record the rationale, or make `cancel()` non-public.
+
+### Info
+
+#### I1. Open decision unresolved: message bus vs direct calls
+
+- **Source:** framework-agnostic
+- **Rule:** Brief §8 item 3.
+- **Fix:** run `arch-decide` to select and record.
+
+## Tool Output (raw)
+
+<Collapsed raw output from each tool that ran, for reference.>
+
+## Next Steps
+
+- Address all Errors before merge
+- Triage Warnings: either fix, record as ADR, or update the brief
+- Review Info items at the next architecture meeting
+```
+
+## Severity Mapping
+
+| Severity | Meaning | Action |
+|---|---|---|
+| **Error** | Structural violation of declared architecture. Code contradicts brief/ADR. | Block merge. Fix or update the brief (with an ADR). |
+| **Warning** | Deviation that may be intentional but wasn't recorded. | Discuss. Record via `arch-decide` or fix. |
+| **Info** | Open question or unresolved decision. Not a violation, but attention-worthy. | Triage. |
+
+## Review Checklist
+
+Before handing off the report:
+
+- [ ] All framework-agnostic checks ran
+- [ ] Detected linters ran if configured; skipped silently if not
+- [ ] Each finding has: severity, source (native or tool name), file/line, rule reference, suggested fix
+- [ ] Each finding links to the brief section or ADR that establishes the rule
+- [ ] Raw tool output preserved at the bottom for traceability
+- [ ] Report timestamped and commit-referenced
+
+## Anti-Patterns
+
+- **Silent failures.** A tool that errored out is a finding too — report it (Info: "dependency-cruiser failed to parse config; no TypeScript import checks ran").
+- **Swallowing open decisions.** An unresolved item in brief §8 is a legitimate warning. Don't treat it as "not a violation."
+- **Tool-only reports.** If the language-specific tool is configured and clean, don't skip the framework-agnostic checks — they cover things the tool doesn't (API drift, open decisions).
+- **Rule references missing.** Every finding must cite the rule. If you can't find the rule, the finding isn't actionable.
+- **Error inflation.** Reserve Error for real violations. Warnings exist to avoid crying wolf.
+
+## Integration with CI (future)
+
+v1 produces a markdown report. v2+ will add:
+
+- JSON output (machine-readable)
+- Exit-code behavior (non-zero when any Error)
+- `--fail-on-warnings` flag for strict mode
+- Delta mode (only report *new* findings since last evaluation)
+- Auto-generation of linter configs from the brief
+
+See `docs/architecture/v2-todo.org` in the rulesets repo for the full deferred list.
+
+## Hand-Off
+
+After the report:
+
+- **Errors** → fix or justify (each justification becomes an ADR via `arch-decide`)
+- **Warnings** → record ADRs for the deliberate ones; fix the rest
+- **Info** → resolve open decisions via `arch-decide`
+- After significant fixes, re-run `arch-evaluate` to verify
+- Consider re-running `arch-document` if the brief or ADRs changed as a result
diff --git a/.claude/commands/brainstorm.md b/.claude/commands/brainstorm.md
new file mode 100644
index 0000000..b47523f
--- /dev/null
+++ b/.claude/commands/brainstorm.md
@@ -0,0 +1,195 @@
+---
+name: brainstorm
+description: Refine a vague idea into a validated design through structured one-question-at-a-time dialogue, diverse option exploration (three conventional + three tail samples), and chunked validation (200-300 words at a time). Produces `docs/design/<topic>.md` as the output artifact. Use when shaping a new feature, service, or workflow before implementation begins — or when a "we should probably…" idea needs to become concrete enough to build. Do NOT use for mechanical well-defined work (renames, reformats, dependency bumps), for system-level architecture choices (use arch-design), for recording a single decision that has already been made (use arch-decide), or for debugging an existing error (use root-cause-trace or debug). Synthesized from the Agentic-Context-Engineering / SDD brainstorm pattern — probabilistic diversity sampling originated there.
+---
+
+# Brainstorm
+
+Turn a rough idea into a design document concrete enough to implement. One question at a time, diverse options considered honestly, design validated in chunks.
+
+## When to Use
+
+- Shaping a new feature, component, service, or workflow before code exists
+- Translating "we should probably …" into something buildable
+- Converting a noticed-but-unshaped problem into a design
+- Exploring whether an idea is worth building before committing to it
+
+## When NOT to Use
+
+- Mechanical, well-defined work (rename, reformat, upgrade a dependency, apply a migration)
+- System-level architecture (use `arch-design`)
+- Recording a specific decision already made (use `arch-decide`)
+- Debugging an existing error (use `root-cause-trace` or `debug`)
+- Writing code whose shape is already clear to you
+
+## Workflow
+
+Three phases. Each converges a little more. Each is validated before moving to the next.
+
+### Phase 1 — Understand the Idea
+
+Dialogue, not interrogation. Before the first question, read the project context: the relevant directory, recent commits, any existing docs, the language and stack. Ground your questions in what's already there.
+
+**Rules:**
+
+- **One question per message.** Never batch. Respondents answer the easiest question and skip the rest when you batch.
+- **Prefer multiple choice.** Easier to answer, faster to skim, surfaces the option space for free.
+- **Open-ended when the option space is too large to enumerate.** Then refine with follow-ups.
+- **Focus the first questions on purpose, not mechanism.** Mechanism comes in phase 2.
+
+**Topics to cover (not a script — skip what's already clear):**
+
+- What problem does this solve?
+- Who is the user or caller?
+- What's the smallest version that would be useful?
+- What's explicitly out of scope?
+- What are the success criteria (measurable if possible)?
+- What constraints apply — team size, timeline, existing code, stack?
+
+**Stop when** you can state the idea back in one sentence and the user confirms. Don't keep asking for the sake of thoroughness.
+
+### Phase 2 — Explore Approaches
+
+Before committing to a direction, generate **six candidate approaches**. Force diversity.
+
+**Composition:**
+
+- **Three conventional approaches** — the paths a reasonable engineer with relevant experience would consider. Each has a high prior probability of being right (roughly ≥80%).
+- **Three tail samples** — approaches from genuinely different regions of the solution space. Each individually unlikely (roughly ≤10%), but together they expand what's been considered. These are the ones that surprise.
+
+Why tail samples matter: most teams converge on the first conventional option. The tail samples either reveal a better solution you hadn't considered, or they clarify *why* the conventional option is the right one. Either way, the recommendation that follows is stronger.
+
+**For each candidate, state:**
+
+- One-paragraph summary
+- Honest pros
+- Honest cons (no selling; if you can't name real cons, you haven't thought hard enough)
+
+**End with:**
+
+- Your recommendation
+- Why — explicit mapping to the constraints and success criteria from phase 1
+- What's being traded away
+- What becomes an open decision for `arch-decide` later
+
+### Phase 3 — Present the Design
+
+Once a direction is chosen, describe it in **200-300 word chunks**. After each chunk, ask "does this look right so far?" Never dump a wall of design prose — the user skims walls.
+
+**Typical sections, one chunk each:**
+
+1. **Architecture** — components, boundaries, key interfaces
+2. **Data flow** — what moves through where, in what shape
+3. **Persistence** — what's stored, where, for how long, with what durability
+4. **Error handling** — expected failures, responses, user-facing behavior
+5. **Testing approach** — what correctness means here, how it gets verified
+6. **Observability** — what gets logged, measured, traced
+
+Skip sections that don't apply. Add domain-specific ones (auth flow, concurrency model, migration plan, rollout strategy) where relevant.
+
+**Be willing to back up.** If a chunk surfaces a question that invalidates an earlier chunk, revise. Committing to a wrong direction to avoid rework is the expensive path.
+
+## Output
+
+Write the validated design to `docs/design/<topic>.md`. Use this skeleton (omit sections that don't apply):
+
+```markdown
+# Design: <topic>
+
+**Date:** <YYYY-MM-DD>
+**Status:** Draft | Accepted | Implemented
+
+## Problem
+
+One paragraph. What, who, why now.
+
+## Non-Goals
+
+Bullet list of what this explicitly does not address. Prevents scope creep.
+
+## Approaches Considered
+
+### Recommended: <name>
+
+Summary, pros, cons.
+
+### Rejected: <name>
+
+Why not. Brief.
+
+(Include 2-3 rejected options — showing alternatives were weighed is itself valuable.)
+
+## Design
+
+### Architecture
+
+### Data Flow
+
+### Persistence
+
+### Error Handling
+
+### Testing
+
+### Observability
+
+## Open Questions
+
+Items that need decisions before or during implementation.
+
+- [ ] Question — likely candidate for an ADR via `arch-decide`
+- [ ] Question — …
+
+## Next Steps
+
+- Open questions → `arch-decide`
+- If this implies system-level structural change → `arch-design`
+- Implementation → <agreed next action>
+```
+
+If `docs/design/` doesn't exist, create it. If a design with the same topic name exists, ask before overwriting.
+
+## Hand-Off
+
+After the design is written and agreed:
+
+- **Each open question** → run `arch-decide` to record as an ADR
+- **System-level implications** → run `arch-design` to update the architecture brief
+- **Implementation** → whatever the project's implementation path is (`start-work`, `add-tests`, or direct work)
+
+Link the design doc from wherever it's being implemented (PR description, ticket, commit).
+
+## Review Checklist
+
+Before declaring the design accepted:
+
+- [ ] Problem stated in one paragraph that a newcomer could understand
+- [ ] Non-goals listed (at least 1-2)
+- [ ] At least 6 approaches considered, with 3 genuinely in the tail
+- [ ] Recommendation names what it's trading away
+- [ ] Each design chunk was individually validated
+- [ ] Open questions listed; each has a clear path forward
+- [ ] User has confirmed the design reflects their intent
+
+## Anti-Patterns
+
+- **Asking three questions in one message.** The user answers the easiest. Ask one.
+- **Leading questions.** "Don't you think Postgres is right here?" isn't exploration.
+- **Skipping tail samples.** If all six options are minor variations on the conventional answer, diversity wasn't actually attempted.
+- **Dumping the whole design at once.** Eight hundred words without validation checkpoints means the user skims and misses the thing they'd want to push back on.
+- **Hiding trade-offs.** Every approach has real cons. State them or the evaluation is theatre.
+- **"We'll figure it out in implementation."** If a design question is ducked here, it becomes a larger, costlier problem during coding. Resolve now or explicitly defer with an ADR.
+- **Over-specifying.** The design should be detailed enough to implement, not so detailed it's already the implementation. If you're writing function bodies in the design, stop.
+
+## Key Principles
+
+- **One question at a time.** Non-negotiable.
+- **Multiple choice beats open-ended** when the option space is small enough to enumerate.
+- **Six approaches, three in the tail.** Discipline around diversity.
+- **Validate in chunks.** 200-300 words, then check.
+- **YAGNI ruthlessly.** Remove anything not justified by the problem statement.
+- **Back up when needed.** Revising early beats committing to wrong.
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/c4-analyze.md b/.claude/commands/c4-analyze.md
new file mode 100644
index 0000000..ab8986b
--- /dev/null
+++ b/.claude/commands/c4-analyze.md
@@ -0,0 +1,212 @@
+---
+name: c4-analyze
+description: Analyze a codebase or git repo and generate C4 architecture diagrams (System Context, Container, Component). Use when the user wants to visualize or document the architecture of an existing project. Dispatched by arch-document for building-block and container views; usable standalone for quick architecture visualization. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+argument-hint: "[path-or-container-name]"
+---
+
+# /c4-analyze — Generate C4 Architecture Diagrams from Code
+
+Analyze the current codebase (or a specified path) and generate C4 model diagrams
+following Simon Brown's methodology.
+
+## Arguments
+
+- No argument: analyze the current working directory
+- A path (e.g., `./services/api`): analyze that subdirectory
+- A container name from a previous analysis (e.g., `api-server`): generate a Level 3 Component diagram for that container
+
+## Step 1: Reconnaissance
+
+Scan the repository to understand its structure. Look for these signals:
+
+### Project boundaries and workspaces
+- `pnpm-workspace.yaml`, `lerna.json`, `nx.json`, `turbo.json` → JS/TS monorepo
+- `go.work`, multiple `go.mod` files → Go workspace
+- `Cargo.toml` with `[workspace]` → Rust workspace
+- Multiple `pom.xml` or `settings.gradle` → Java multi-module
+- Multiple `pyproject.toml`, `setup.py`, or a `packages/` directory → Python monorepo
+
+### Containers (deployable units)
+- `Dockerfile`, `docker-compose.yml`, `docker-compose.yaml` → containerized services
+- Kubernetes manifests (`k8s/`, `*.yaml` with `kind: Deployment`) → k8s services
+- `serverless.yml`, `sam-template.yaml`, AWS CDK/Terraform with Lambda → serverless functions
+- Separate apps with their own entry points (`main.go`, `manage.py`, `index.ts`, `Program.cs`)
+- Database migration directories (`migrations/`, `alembic/`, `flyway/`)
+
+### Technology detection
+- Package managers: `package.json`, `requirements.txt`, `pyproject.toml`, `go.mod`, `Cargo.toml`, `pom.xml`, `Gemfile`, `mix.exs`
+- Frameworks: look at imports/dependencies (Django, Flask, FastAPI, Express, Next.js, Spring Boot, Rails, Phoenix, etc.)
+- Databases: connection strings in config, ORM models, migration tools
+- Message queues: Kafka, RabbitMQ, Redis, SQS client imports
+
+### External system integrations
+- HTTP client calls to external APIs (look for base URLs, API client classes)
+- SDK imports (AWS SDK, Stripe, Twilio, SendGrid, etc.)
+- OAuth/SAML/SSO integrations
+- Third-party service configuration (environment variables referencing external URLs)
+
+### Users/actors
+- Authentication providers → user types
+- API key/token auth → machine clients
+- Admin interfaces → admin users
+- Public vs internal endpoints → different user groups
+
+## Step 2: Build the Model
+
+Organize findings into the C4 hierarchy:
+
+```
+Software System
+├── People (users, roles, external actors)
+├── External Systems (third-party services, APIs)
+└── Containers (your deployable units)
+    ├── Name, technology, description
+    ├── Relationships to other containers
+    └── Components (for drill-down)
+        ├── Name, technology, description
+        └── Relationships to other components
+```
+
+For each element, capture:
+- **Name**: Clear, specific (not "business logic" or "backend")
+- **Technology**: The actual framework/language/platform
+- **Description**: One sentence about its responsibility
+- **Relationships**: What it talks to, how (protocol), and why (purpose)
+
+## Step 3: Generate Diagrams
+
+Generate diagrams as **draw.io XML** (`.drawio` files). Always generate Level 1 and Level 2.
+
+### draw.io XML format
+
+Use the `<mxfile><diagram><mxGraphModel>` structure with `<mxCell>` elements.
+
+### C4 color scheme
+
+| Element | Fill Color | Stroke Color | Font Color | Shape |
+|---------|-----------|-------------|-----------|-------|
+| Person | `#08427B` | `#073B6F` | `#ffffff` | `shape=mxgraph.c4.person2` |
+| System (in scope) | `#1168BD` | `#0B4884` | `#ffffff` | `rounded=1;arcSize=10` |
+| System (external) | `#999999` | `#8A8A8A` | `#ffffff` | `rounded=1;arcSize=10` |
+| Container | `#438DD5` | `#3C7FC0` | `#ffffff` | `rounded=1;arcSize=10` |
+| Container (database) | `#438DD5` | `#3C7FC0` | `#ffffff` | `shape=cylinder3` |
+| Component | `#85BBF0` | `#78A8D8` | `#000000` | `rounded=1;arcSize=10` |
+| Planned/future | `#CCCCCC` | `#AAAAAA` | `#ffffff` | `dashed=1` |
+| Boundary | none | `#0B4884` (system) / `#3C7FC0` (container) | match stroke | `swimlane;dashed=1;dashPattern=8 4` |
+
+### Element label format
+
+Every element label should include (using HTML in the `value` attribute):
+- **Bold name**
+- Element type in small text: `[Person]`, `[Software System]`, `[Container: Technology]`, `[Component: Technology]`
+- Description in normal text
+
+Example: `<b>API Server</b><br><font style="font-size:10px">[Container: Django 6.0, Python 3.12]</font><br><br><font style="font-size:11px">REST API handling mission CRUD and agent orchestration</font>`
+
+### Diagram structure rules
+
+- **Title**: Text cell at the top with `fontSize=18;fontStyle=1`
+- **Key/legend**: Small boxes in a corner showing the color scheme
+- **Relationships**: `edgeStyle=orthogonalEdgeStyle;rounded=1` with label showing purpose + protocol. Always use proper `source` and `target` attributes referencing element IDs — never use manual `sourcePoint`/`targetPoint` coordinates, which render as floating disconnected lines.
+- **Boundaries**: Use `swimlane` style with `container=1;collapsible=0` — child elements use `parent="boundaryId"`
+- **Layout**: People at top, system in scope in center, external systems around the edges or bottom. Most important element centered.
+- **Canvas size**: `pageWidth="1600" pageHeight="1200"` minimum; increase for complex diagrams
+
+### Spacing and layout rules (critical)
+
+These rules prevent overlapping text and unreadable diagrams:
+
+- **140px minimum** between context elements (people, external containers) and the boundary top, so edge labels don't overlap the boundary title text.
+- **200px minimum** between boundary bottom and external systems below, so edge labels between them are readable and don't overlap the boundary line.
+- **150px minimum gap** between sibling elements in the same row (containers, components), so relationship labels between them have room.
+- **Center contents inside boundaries**: calculate total content width, subtract from boundary width, divide by 2 for left margin. Do this for each row independently.
+- **Use explicit waypoints** (`<Array as="points">`) to route edges around obstacles. Waypoints should run through the gap between the boundary edge and external systems.
+
+### Edge routing rules
+
+- **Never route an edge through a component or container.** If an edge would cross an element, spread elements apart to create a clear lane, or reroute the edge with waypoints.
+- **Edges that skip rows must route around, not through.** If a service in row 2 connects to an external system below the boundary, and row 3 (data stores) is in between, the edge must route along the left or right margin of the boundary and exit from the bottom — never vertically through row 3. Use waypoints to hug the boundary edge.
+- **Cross-row edges** (e.g., connecting elements that aren't adjacent) must route above or below the intervening row using waypoints, not through it.
+- **Edges exiting a boundary** to reach external systems should use waypoints at intermediate Y positions between the boundary bottom and the external system top, with each edge at a different Y to avoid overlapping.
+- **All edges must use proper `source` and `target` attributes** referencing element IDs. Never use `sourcePoint`/`targetPoint` manual coordinates — these create disconnected lines and make the audit untraceable.
+
+### Level 1: System Context
+
+Show the software system as a single box, surrounded by users and external systems.
+
+### Level 2: Container
+
+Zoom into the system boundary (drawn as a dashed swimlane) to show containers with technology choices. Include people and external systems outside the boundary for context.
+
+### Level 3: Component (on request or when drilling into a container)
+
+Zoom into a single container (drawn as a dashed swimlane) to show its components. Include surrounding containers and external systems for context.
+
+## Step 4: Layout Audit (mandatory — do this before saving)
+
+After generating the initial diagram XML, perform this audit. If any check fails, adjust the layout and re-check. Repeat until all checks pass.
+
+### 4a. Trace every edge path
+
+For each edge, compute the full path by listing the absolute coordinates of every point: source exit point → each waypoint → target entry point. Each pair of consecutive points forms a segment. For each segment, check if the line intersects the bounding box (x, y, width, height) of every element in the diagram. If any segment crosses through any element:
+- Route the edge around the element using waypoints along the boundary margin, OR
+- Spread elements apart to create a clear lane
+
+Special attention: edges connecting to external systems below the boundary often have a long vertical segment dropping through lower rows. These MUST route along the left or right margin first, then drop below the boundary, then turn horizontally to reach the target.
+
+### 4b. Check edge label positions for overlaps
+
+draw.io places edge labels at the midpoint of the longest segment of an orthogonal edge. For each edge label:
+1. Identify the longest segment of the edge path (the segment with the greatest length).
+2. The label will be centered on this segment's midpoint.
+3. Estimate the label's bounding box conservatively: use **12px per character width** and **20px per line height**, centered on the midpoint. Over-estimating is better than under-estimating — a little extra spacing is preferable to overlapping text.
+4. Check if this bounding box overlaps with:
+   - Any element (container, component, person, external system)
+   - The boundary title text (the swimlane header area)
+   - Any other edge label's estimated bounding box
+
+**Critical:** If the longest segment is a vertical drop adjacent to a container (e.g., the last segment before the target), the label will render on top of that neighboring container. To fix this, add waypoints so that the longest segment is in open space — typically a horizontal run through a clear gap between rows.
+
+If an overlap is found:
+- Add waypoints to change which segment is longest, moving where the label lands, OR
+- Shift the edge's exit/entry points to move the longest segment to a clear area, OR
+- Increase spacing between the elements the edge connects
+
+### 4c. Check parallel and bundled edges
+
+**No two edges may share the same waypoint Y or X value.** When edges share a coordinate, draw.io renders them on top of each other, making it impossible to trace which line goes where. For every pair of edges:
+- Each edge must use a **unique waypoint Y** (for horizontal runs) or **unique waypoint X** (for vertical runs), separated by at least **30px**.
+- Source exit points from the same element must be spread across different positions (e.g., exitX=0.2 vs exitX=0.5 vs exitX=0.8). Never use the same exit point for two edges.
+- Target entry points on the same element must also be spread (e.g., entryX=0.3 vs entryX=0.7).
+- If two edges run parallel for any segment, offset one of them by at least 30px using additional waypoints.
+
+Labels on parallel edges will also overlap if the edges are close. Since labels are centered on the longest segment, ensure the longest segments of nearby edges are in **different spatial areas** — either at different Y levels or with enough horizontal separation (at least 200px between label centers).
+
+### 4d. Verify boundary title clearance
+
+Check that no edge or edge label passes through or overlaps the boundary's swimlane header (the `startSize` area, typically 30px tall at the top of the boundary). If edges enter the boundary from above, they should connect at the boundary top edge, not route through the title.
+
+## Step 5: Output
+
+1. **Save the draw.io XML** to the project root:
+   - Filenames: `system-context.drawio`, `containers.drawio`, `components-[container-name].drawio`
+
+2. **Export to PNG** by running `drawio --export --format png --output <name>.png <name>.drawio` via Bash.
+
+3. **Open in draw.io desktop** by running `drawio <filepath>` via Bash.
+
+4. **Print a summary** of what was discovered:
+   - Number of containers, components, external systems, and user types identified
+   - Key technology choices detected
+   - Any ambiguities or areas where manual refinement is recommended
+
+## Guidelines
+
+- **Be specific, not generic.** "FastAPI REST server handling order management" beats "API server."
+- **Include technology choices.** The actual framework, database, protocol — not just "web app" or "database."
+- **Annotate every relationship** with its purpose and protocol/mechanism.
+- **Don't mix abstraction levels.** A container diagram shows containers, not classes.
+- **For monorepos:** Start with the top-level scan to show all services as containers. Offer to drill into specific services for components.
+- **When uncertain:** Note the ambiguity in the summary rather than guessing. Flag it for the user to clarify.
+- **Omit infrastructure cross-cutting concerns** (logging, monitoring) from component diagrams unless they are architecturally significant.
+- **Relationships should use prepositions** that match arrow direction: "reads from," "sends to," "makes API calls to."
diff --git a/.claude/commands/c4-diagram.md b/.claude/commands/c4-diagram.md
new file mode 100644
index 0000000..15948e4
--- /dev/null
+++ b/.claude/commands/c4-diagram.md
@@ -0,0 +1,199 @@
+---
+name: c4-diagram
+description: Generate C4 architecture diagrams from a textual description of a software system. Use when the user describes a system they want to diagram, or wants to create architecture diagrams for a planned/proposed system. Dispatched by arch-document for context and container views; usable standalone when the system is described in prose rather than existing code. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
+argument-hint: "[description or diagram level]"
+---
+
+# /c4-diagram — Generate C4 Architecture Diagrams from Description
+
+Generate C4 model architecture diagrams based on a conversational description of a
+software system, following Simon Brown's methodology.
+
+## Arguments
+
+- A description of the system (e.g., `/c4-diagram e-commerce platform with React frontend and Python backend`)
+- A diagram level to generate (e.g., `/c4-diagram component api-server` — if a model has already been established in conversation)
+- No argument: ask the user to describe their system
+
+## Flow
+
+### Phase 1: Understand the System
+
+If the user provides a brief description, ask clarifying questions to fill gaps.
+Ask only what's needed — don't interrogate. Group questions logically.
+
+**Essential questions (ask if not provided):**
+
+1. **What does the system do?** (one-sentence purpose)
+2. **Who uses it?** (user types/roles — end users, admins, other systems, scheduled jobs)
+3. **What are the major pieces?** (web app, mobile app, API, database, workers, etc.)
+
+**Follow-up questions (ask if relevant, based on what was shared):**
+
+4. **What external systems does it integrate with?** (payment providers, email services, auth providers, third-party APIs)
+5. **What are the key technology choices?** (languages, frameworks, databases, message queues)
+6. **How do the pieces communicate?** (REST, GraphQL, gRPC, message queues, webhooks)
+
+Don't ask all questions at once. Start with 1-3, then follow up based on the answers.
+
+### Phase 2: Build the Model
+
+Organize the information into the C4 hierarchy:
+
+- **People**: Users, roles, external actors
+- **Software System**: The system being described (the thing in scope)
+- **External Systems**: Third-party services, legacy systems, partner APIs
+- **Containers**: The major deployable/runnable pieces
+- **Components**: Internal groupings within containers (only if user wants Level 3)
+
+For each element, define:
+- **Name**: Clear and specific
+- **Technology**: If known or decided; note "TBD" if still under consideration
+- **Description**: One sentence about its responsibility
+- **Relationships**: What connects to what, the purpose, and the mechanism
+
+### Phase 3: Generate Diagrams
+
+Generate diagrams as **draw.io XML** (`.drawio` files).
+
+Default behavior: generate Level 1 (System Context) and Level 2 (Container).
+Generate Level 3 (Component) only when the user asks to drill into a specific container.
+
+#### draw.io XML format
+
+Use the `<mxfile><diagram><mxGraphModel>` structure with `<mxCell>` elements.
+
+#### C4 color scheme
+
+| Element | Fill Color | Stroke Color | Font Color | Shape |
+|---------|-----------|-------------|-----------|-------|
+| Person | `#08427B` | `#073B6F` | `#ffffff` | `shape=mxgraph.c4.person2` |
+| System (in scope) | `#1168BD` | `#0B4884` | `#ffffff` | `rounded=1;arcSize=10` |
+| System (external) | `#999999` | `#8A8A8A` | `#ffffff` | `rounded=1;arcSize=10` |
+| Container | `#438DD5` | `#3C7FC0` | `#ffffff` | `rounded=1;arcSize=10` |
+| Container (database) | `#438DD5` | `#3C7FC0` | `#ffffff` | `shape=cylinder3` |
+| Component | `#85BBF0` | `#78A8D8` | `#000000` | `rounded=1;arcSize=10` |
+| Planned/future | `#CCCCCC` | `#AAAAAA` | `#ffffff` | `dashed=1` |
+| Boundary | none | `#0B4884` (system) / `#3C7FC0` (container) | match stroke | `swimlane;dashed=1;dashPattern=8 4` |
+
+#### Element label format
+
+Every element label should include (using HTML in the `value` attribute):
+- **Bold name**
+- Element type in small text: `[Person]`, `[Software System]`, `[Container: Technology]`, `[Component: Technology]`
+- Description in normal text
+
+Example: `<b>API Server</b><br><font style="font-size:10px">[Container: Django 6.0, Python 3.12]</font><br><br><font style="font-size:11px">REST API handling mission CRUD and agent orchestration</font>`
+
+#### Diagram structure rules
+
+- **Title**: Text cell at top with `fontSize=18;fontStyle=1`
+- **Key/legend**: Small boxes in a corner showing the color scheme
+- **Relationships**: `edgeStyle=orthogonalEdgeStyle;rounded=1` with label showing purpose + protocol. Always use proper `source` and `target` attributes referencing element IDs — never use manual `sourcePoint`/`targetPoint` coordinates, which render as floating disconnected lines.
+- **Boundaries**: Use `swimlane` style with `container=1;collapsible=0` — child elements use `parent="boundaryId"`
+- **Layout**: People at top, system in scope in center, external systems around edges/bottom
+- **Canvas size**: `pageWidth="1600" pageHeight="1200"` minimum; increase for complex diagrams
+
+#### Spacing and layout rules (critical)
+
+These rules prevent overlapping text and unreadable diagrams:
+
+- **140px minimum** between context elements (people, external containers) and the boundary top, so edge labels don't overlap the boundary title text.
+- **200px minimum** between boundary bottom and external systems below, so edge labels between them are readable and don't overlap the boundary line.
+- **150px minimum gap** between sibling elements in the same row (containers, components), so relationship labels between them have room.
+- **Center contents inside boundaries**: calculate total content width, subtract from boundary width, divide by 2 for left margin. Do this for each row independently.
+- **Use explicit waypoints** (`<Array as="points">`) to route edges around obstacles. Waypoints should run through the gap between the boundary edge and external systems.
+
+#### Edge routing rules
+
+- **Never route an edge through a component or container.** If an edge would cross an element, spread elements apart to create a clear lane, or reroute the edge with waypoints.
+- **Edges that skip rows must route around, not through.** If a service in row 2 connects to an external system below the boundary, and row 3 (data stores) is in between, the edge must route along the left or right margin of the boundary and exit from the bottom — never vertically through row 3. Use waypoints to hug the boundary edge.
+- **Cross-row edges** (e.g., connecting elements that aren't adjacent) must route above or below the intervening row using waypoints, not through it.
+- **Edges exiting a boundary** to reach external systems should use waypoints at intermediate Y positions between the boundary bottom and the external system top, with each edge at a different Y to avoid overlapping.
+- **All edges must use proper `source` and `target` attributes** referencing element IDs. Never use `sourcePoint`/`targetPoint` manual coordinates — these create disconnected lines and make the audit untraceable.
+
+#### Level 1: System Context
+
+Show the software system as a single box, surrounded by users and external systems.
+
+#### Level 2: Container
+
+Zoom into the system boundary (dashed swimlane) to show containers with technology choices. Include people and external systems outside the boundary for context.
+
+#### Level 3: Component
+
+Zoom into a single container (dashed swimlane) to show its components. Include surrounding containers and external systems for context.
+
+#### Deployment Diagram (if the user asks about infrastructure)
+
+Show deployment nodes (nested boxes) containing containers, mapped to infrastructure.
+
+### Phase 4: Layout Audit (mandatory — do this before saving)
+
+After generating the initial diagram XML, perform this audit. If any check fails, adjust the layout and re-check. Repeat until all checks pass.
+
+#### 4a. Trace every edge path
+
+For each edge, compute the full path by listing the absolute coordinates of every point: source exit point → each waypoint → target entry point. Each pair of consecutive points forms a segment. For each segment, check if the line intersects the bounding box (x, y, width, height) of every element in the diagram. If any segment crosses through any element:
+- Route the edge around the element using waypoints along the boundary margin, OR
+- Spread elements apart to create a clear lane
+
+Special attention: edges connecting to external systems below the boundary often have a long vertical segment dropping through lower rows. These MUST route along the left or right margin first, then drop below the boundary, then turn horizontally to reach the target.
+
+#### 4b. Check edge label positions for overlaps
+
+draw.io places edge labels at the midpoint of the longest segment of an orthogonal edge. For each edge label:
+1. Identify the longest segment of the edge path (the segment with the greatest length).
+2. The label will be centered on this segment's midpoint.
+3. Estimate the label's bounding box conservatively: use **12px per character width** and **20px per line height**, centered on the midpoint. Over-estimating is better than under-estimating — a little extra spacing is preferable to overlapping text.
+4. Check if this bounding box overlaps with:
+   - Any element (container, component, person, external system)
+   - The boundary title text (the swimlane header area)
+   - Any other edge label's estimated bounding box
+
+**Critical:** If the longest segment is a vertical drop adjacent to a container (e.g., the last segment before the target), the label will render on top of that neighboring container. To fix this, add waypoints so that the longest segment is in open space — typically a horizontal run through a clear gap between rows.
+
+If an overlap is found:
+- Add waypoints to change which segment is longest, moving where the label lands, OR
+- Shift the edge's exit/entry points to move the longest segment to a clear area, OR
+- Increase spacing between the elements the edge connects
+
+#### 4c. Check parallel and bundled edges
+
+**No two edges may share the same waypoint Y or X value.** When edges share a coordinate, draw.io renders them on top of each other, making it impossible to trace which line goes where. For every pair of edges:
+- Each edge must use a **unique waypoint Y** (for horizontal runs) or **unique waypoint X** (for vertical runs), separated by at least **30px**.
+- Source exit points from the same element must be spread across different positions (e.g., exitX=0.2 vs exitX=0.5 vs exitX=0.8). Never use the same exit point for two edges.
+- Target entry points on the same element must also be spread (e.g., entryX=0.3 vs entryX=0.7).
+- If two edges run parallel for any segment, offset one of them by at least 30px using additional waypoints.
+
+Labels on parallel edges will also overlap if the edges are close. Since labels are centered on the longest segment, ensure the longest segments of nearby edges are in **different spatial areas** — either at different Y levels or with enough horizontal separation (at least 200px between label centers).
+
+#### 4d. Verify boundary title clearance
+
+Check that no edge or edge label passes through or overlaps the boundary's swimlane header (the `startSize` area, typically 30px tall at the top of the boundary). If edges enter the boundary from above, they should connect at the boundary top edge, not route through the title.
+
+### Phase 5: Output
+
+1. **Save the draw.io XML** to the project root (or current directory):
+   - Filenames: `system-context.drawio`, `containers.drawio`, `components-[name].drawio`
+
+2. **Export to PNG** by running `drawio --export --format png --output <name>.png <name>.drawio` via Bash.
+
+3. **Open in draw.io desktop** by running `drawio <filepath>` via Bash.
+
+4. **Offer next steps:**
+   - "Want me to zoom into any container for a component diagram?"
+   - "Want to add a deployment diagram?"
+   - "Anything to adjust?"
+
+## Guidelines
+
+- **Be conversational, not bureaucratic.** Don't force the user through a rigid questionnaire. Adapt based on what they share.
+- **Infer where reasonable.** If the user says "React frontend with a Node API and Postgres," you don't need to ask what the database technology is.
+- **Be specific in element descriptions.** "Provides product search, cart management, and checkout" beats "Handles business logic."
+- **Include technology choices** even during upfront design. If the user hasn't decided, note options: "Database [PostgreSQL or MySQL — TBD]".
+- **Annotate every relationship** with purpose and protocol/mechanism.
+- **Don't mix abstraction levels.** Keep each diagram at one level of the C4 hierarchy.
+- **Use prepositions in relationship labels** that match arrow direction: "reads from," "sends to," "makes API calls to."
+- **For large systems:** Start with System Context, then Container. Offer Component diagrams for the most interesting containers rather than trying to diagram everything.
+- **Iterate.** After generating a diagram, ask if it looks right. The user often has corrections or additions after seeing the first version.
diff --git a/.claude/commands/codify.md b/.claude/commands/codify.md
new file mode 100644
index 0000000..4a0b5e6
--- /dev/null
+++ b/.claude/commands/codify.md
@@ -0,0 +1,196 @@
+---
+name: codify
+description: Codify concrete, actionable insights from recent session work into the project's `CLAUDE.md` so they survive across sessions and compound over time. Harvests patterns that worked, anti-patterns that bit, API gotchas, specific thresholds, and verification checks. Filters against quality gates (atomic, evidence-backed, non-redundant, verifiable, safe, stable). Writes into a dedicated `## Codified Insights` section rather than scattering entries. Use after a productive session, a bug fix that revealed a non-obvious pattern, or an explicit review where you want learnings preserved as rules. Supports `--dry-run` to preview, `--max=N` to cap output, `--target=<path>` to write elsewhere, `--section=<name>` to override the destination section. Flags insights that look cross-project and suggests promotion to `~/code/rulesets/claude-rules/` instead. Do NOT use for session wrap-up / progress summaries (not insights), for private personal context (auto-memory handles that, not a tracked file), or for formal rules that belong in `.claude/rules/`. Informed by Agentic Context Engineering (ACE, arXiv:2510.04618) — grow-and-refine without context collapse.
+---
+
+# Codify
+
+Turn transient session insights into durable, actionable entries in the project's `CLAUDE.md`. Each invocation should make the next session measurably better without diluting what's already there. Select carefully, phrase specifically, commit deliberately.
+
+## When to Use
+
+- End of a productive session where you want concrete patterns preserved
+- After a bug fix that revealed a non-obvious constraint or gotcha
+- After a review where a pattern was identified as worth repeating (or avoiding)
+- When you notice yourself re-deriving the same insight — it belongs written down
+
+## When NOT to Use
+
+- For a session wrap-up summary (that's narrative, not an insight)
+- For personal/private context (auto-memory at `~/.claude/projects/<cwd>/memory/` captures that — see below)
+- For formal project rules — those belong in `.claude/rules/*.md`
+- When you have no specific, evidence-backed insight yet — skip rather than fabricate
+
+## Relationship to Other Memory Systems
+
+Three distinct systems, zero overlap:
+
+| System | Location | Scope | Purpose |
+|---|---|---|---|
+| **auto-memory** | `~/.claude/projects/<cwd>/memory/` | Private, per-working-directory | Session-bridging context about the user and project (feedback, user traits, project state). Written continuously by the agent. |
+| **`/codify` (this skill)** | Project `CLAUDE.md` | Public, tracked, per-project | Explicit, curated rules and patterns. Written deliberately by the user invocation. |
+| **Formal rules** | `.claude/rules/*.md`, `~/code/rulesets/claude-rules/` | Public, tracked, per-project or global | Stable policy (style, conventions, verification). Authored once, rarely updated. |
+
+Use the right system for the right content.
+
+## Workflow
+
+Four phases. Each can be skipped if it has no content; none should be silently merged.
+
+### Phase 1 — Harvest
+
+Identify candidate insights from recent work. Look at:
+
+- The session transcript (or files referenced by `--source`)
+- Recent commits and their messages
+- Any `.architecture/evaluation-*.md` from `arch-evaluate`
+- Reflection or critique outputs if they exist
+- Anti-patterns you caught yourself falling into
+
+**Extract only:**
+
+- **Patterns that worked** — preferably with a minimum precondition and a worked example or reference
+- **Anti-patterns that bit** — with the observable symptom and the reason
+- **API / tool gotchas** — auth quirks, rate limits, idempotency, error codes
+- **Verification items** — concrete checks that would catch regressions next time
+- **Specific thresholds** — "pagination above 50 items" not "pagination when needed"
+
+**Exclude:**
+
+- Progress narrative ("today we shipped X")
+- Personal preferences ("I like functional style")
+- Vague aphorisms ("write good code")
+- Unverified claims (if you can't cite code, docs, or repeated observation, skip)
+
+### Phase 2 — Filter (Grow-and-Refine)
+
+For each candidate insight, apply these gates. Fail any → drop the entry.
+
+- **Actionable.** A reader could apply this immediately. "Write good code" fails; "For dataset lookups under ~100 items, Object outperforms Map in V8" passes.
+- **Specific.** Names a threshold, a file, a flow, a version, or a named tool. Generic insights are noise.
+- **Evidence-backed.** Derived from code you just read, docs you just verified, or a pattern observed more than once. Speculation doesn't count.
+- **Atomic.** One idea per bullet. If the insight has two distinct parts, it's two bullets.
+- **Non-redundant.** Check existing `CLAUDE.md` content. If something similar exists, prefer merging or skipping over duplicating. If the new one is genuinely more specific and evidence-backed than the existing one, append it and mark the older one with `(candidate for consolidation)` — don't auto-delete prior user content.
+- **Safe.** No secrets, tokens, private URLs, or PII. Nothing that would leak in a public commit.
+- **Stable.** Prefer patterns that'll remain valid. If version-specific, say so.
+
+### Phase 3 — Write
+
+Write approved insights to a dedicated section of `CLAUDE.md`. Default section name: **`## Codified Insights`**. Override with `--section=<name>`.
+
+**Discipline:**
+
+- **One section only.** Don't scatter entries across CLAUDE.md. All codified content in one place means future `/codify` runs and human readers find it fast.
+- **Create the section if absent.** Place it near the end of CLAUDE.md, before any footer links.
+- **Preserve chronology within the section.** Newer entries appended; don't shuffle.
+- **Include provenance.** Each entry gets a date and, where useful, a one-word source hint (`pattern:`, `gotcha:`, `threshold:`, `anti-pattern:`, `verify:`).
+
+**Entry format:**
+
+```markdown
+- **<short title or rule>.** <One or two sentences. Concrete. Actionable.> (`<source-hint>` — YYYY-MM-DD)
+```
+
+Examples:
+
+```markdown
+- **Pagination threshold.** Fetch endpoints returning >50 items must paginate; clients assume everything ≤50 is complete. (`threshold` — 2026-04-19)
+- **Map vs Object for small lookups.** In V8, Object outperforms Map for <~100 keys; Map wins at 10k+. Use Object for hot config lookups. (`pattern` — 2026-04-19)
+- **Never log `load-file-name` from batch-compile context.** Both `load-file-name` and `buffer-file-name` are nil during top-level evaluation; `file-name-directory nil` raises `stringp, nil`. (`gotcha` — 2026-04-19)
+```
+
+### Phase 4 — Validate
+
+After writing, check:
+
+- [ ] Every entry passed all Phase 2 gates
+- [ ] Each entry is atomic (one idea)
+- [ ] No near-duplicates were created
+- [ ] The `## Codified Insights` section is coherent — entries flow, categories aren't interleaved randomly
+
+If the validation surfaces a problem, fix before exiting.
+
+## Cross-Project Promotion
+
+Some insights apply to *all* your projects, not just this one. Examples:
+
+- "Always emit JSON with a stable key order for git diffs"
+- "For TypeScript libraries, expose types via `package.json#exports`"
+
+When an insight reads as general rather than project-specific, the skill emits a **promotion hint** at the end of the run:
+
+```
+Promotion candidates:
+- "JSON stable key order" — reads as general. Consider adding to:
+    ~/code/rulesets/claude-rules/style.md
+  (would apply to every project via global install)
+
+Keep as project-specific, promote, or drop? [k/p/d]
+```
+
+Promotion happens manually — the skill doesn't edit the rulesets repo automatically. The hint is a nudge to think about scope.
+
+## Arguments
+
+- **`--dry-run`** — show the proposed entries and where they'd be written; do not modify any files.
+- **`--max=N`** — cap output to the top N insights by specificity + evidence.
+- **`--target=<path>`** — write to a different file. Defaults to `./CLAUDE.md`. Use e.g. `docs/learnings.md` if the project prefers a separate file.
+- **`--section=<name>`** — override the default `## Codified Insights` section name.
+- **`--source=<spec>`** — scope what gets harvested. Values: `last` (most recent message), `selection` (a user-highlighted region if supported), `chat:<id>` (a specific past conversation), `commits:<range>` (e.g., `commits:HEAD~10..`). Defaults to a reasonable window of recent session context.
+
+## Output
+
+On a real run (not `--dry-run`):
+
+1. Short summary — "added N entries to `<target>`: X patterns, Y gotchas, Z thresholds."
+2. Any promotion candidates flagged for global-rules consideration.
+3. Confirmation of the file path modified.
+
+On `--dry-run`:
+
+1. Preview of each proposed entry with the section it would land in.
+2. Flagged promotions.
+3. Explicit confirmation nothing was written.
+
+## Anti-Patterns
+
+- **Summarizing the session instead of extracting insights.** "Today we refactored X" is narrative. "X's public API requires parameters in Y order due to Z" is an insight.
+- **Writing entries without evidence.** If you can't point to code, docs, or multiple observations, the entry is speculation.
+- **Overwriting prior content.** Mark conflicts for consolidation; don't auto-delete what the user wrote.
+- **Scattering entries.** One section. Grep-able. Coherent.
+- **Batch-writing 20 entries in one session.** If the session generated 20 real insights, many of them aren't. Filter harder. 3-5 genuine entries per run is typical.
+- **Adding to `CLAUDE.md` when auto-memory is the right system.** Private user-context goes in auto-memory; public project rules go here; static policy goes in `.claude/rules/`.
+- **Promoting too eagerly.** "This applies to all projects" is a strong claim. If you can't name three unrelated projects where the rule would fire, it's project-specific.
+
+## Review Checklist
+
+Before accepting the additions:
+
+- [ ] Each entry has a source hint and a date
+- [ ] Each entry passes the atomic / specific / evidence-backed / non-redundant / safe / stable gates
+- [ ] The `## Codified Insights` section exists exactly once in the target
+- [ ] Promotion candidates (if any) have been flagged, not silently promoted
+- [ ] `--dry-run` was used at least once if the target file is under active template management (e.g. bundled CLAUDE.md from rulesets install)
+
+## Maintenance
+
+`CLAUDE.md` accumulates over time. Periodically:
+
+- **Consolidate.** Entries marked `(candidate for consolidation)` deserve a look. Often the older entry is superseded; sometimes it's a special case of the newer one.
+- **Retire.** Entries about deprecated tools or obsolete versions should be removed explicitly (with a commit message noting the retirement).
+- **Promote.** Re-scan for entries that have started firing across multiple projects — those belong in `~/code/rulesets/claude-rules/`.
+- **Trim.** If the section grows past ~40 entries, either the project is complex enough to warrant splitting `CLAUDE.md` into multiple files, or the entries haven't been curated aggressively enough.
+
+## Theoretical Background
+
+The grow-and-refine / evidence-backed approach draws on Agentic Context Engineering (Zhang et al., *Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models*, arXiv:2510.04618). Key ideas borrowed:
+
+- **Generation → Reflection → Curation** as distinct phases, not a single compression step
+- **Grow-and-refine** — accumulate granular knowledge rather than over-compressing to vague summaries
+- **Avoiding context collapse** — resist the temptation to rewrite old entries into smoother prose; specificity is the value
+
+This skill implements the Curation phase. Reflection and Generation happen in the main conversation.
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/create-v2mom.md b/.claude/commands/create-v2mom.md
new file mode 100644
index 0000000..9d2beef
--- /dev/null
+++ b/.claude/commands/create-v2mom.md
@@ -0,0 +1,426 @@
+---
+name: create-v2mom
+description: Create a V2MOM (Vision, Values, Methods, Obstacles, Metrics) strategic framework for any project, goal, or domain — personal infrastructure, business strategy, health goals, financial planning, software development, career planning, or life goals. Walks the user through the five sections in order: Vision first (aspirational picture of success), then Values (2-4 principles that guide decisions), Methods (4-7 prioritized approaches with concrete actions), Obstacles (honest personal and external challenges), and Metrics (measurable outcomes, not vanity metrics). Includes an optional task-migration phase that consolidates an existing todo list under the defined Methods. Use when the user asks to create a V2MOM, build a strategic plan, set goals for a significant project, or apply ruthless prioritization without an existing framework. Do NOT use for simple todo lists, single-decision prompts (use /arch-decide), quick task brainstorming (use /brainstorm), or daily/weekly planning of routine tasks. Produces a document that becomes the project's decision-making source of truth.
+---
+
+# /create-v2mom — Create a V2MOM Strategic Framework
+
+Transform vague intentions into a concrete action plan using V2MOM: Vision, Values, Methods, Obstacles, Metrics. Originated at Salesforce; works for any domain.
+
+## Problem Solved
+
+Without a strategic framework, projects suffer from:
+
+- **Unclear direction** — "get healthier" / "improve my finances" is too vague to act on; every idea feels equally important; no principled way to say "no."
+- **Priority inflation** — everything feels urgent; research/planning without execution; active todo list grows beyond manageability.
+- **No decision framework** — debates about A vs B waste time; second-guessing after decisions; perfectionism masquerading as thoroughness.
+- **Unmeasurable progress** — can't tell if work is actually making things better; no objective "done" signal; vanity metrics only.
+
+## When to Use
+
+- Starting a significant project (new business, new habit, new system)
+- Existing project has accumulated many competing priorities without clear focus
+- You find yourself constantly context-switching between ideas
+- Someone asks "what are you trying to accomplish?" and the answer is vague
+- Annual or quarterly planning for ongoing projects or life goals
+
+Particularly valuable for: personal infrastructure (tooling, systems, workflows), health and fitness, financial planning, software package development, business strategy, career development.
+
+## Exit Criteria
+
+The V2MOM is complete when:
+
+1. **All 5 sections have concrete content:**
+   - Vision: Clear, aspirational picture of success
+   - Values: 2-4 principles that guide decisions
+   - Methods: 4-7 concrete approaches with specific actions
+   - Obstacles: Honest personal/technical challenges
+   - Metrics: Measurable outcomes (not vanity metrics)
+2. **It's useful for decision-making:** can answer "does X fit this V2MOM?" quickly; provides priority clarity (Method 1 > Method 2 > etc.); identifies what NOT to do.
+3. **Both parties agree it's ready:** feels complete not rushed; actionable enough to start execution; honest about obstacles (not sugar-coated).
+
+**Validation questions:**
+- Can you articulate the vision in one sentence?
+- Do the values help you say "no" to things?
+- Are methods ordered by priority?
+- Can you immediately identify 3-5 tasks from Method 1?
+- Do metrics tell you if you're succeeding?
+
+## Instructions
+
+Complete phases in order. Vision informs Values. Values inform Methods. Methods reveal Obstacles. Everything together defines Metrics.
+
+### Phase 1: Ensure shared understanding of the framework
+
+Confirm both parties understand what each section means:
+
+- **Vision:** What you want to achieve (aspirational, clear picture of success)
+- **Values:** Principles that guide decisions (2-4 values, defined concretely)
+- **Methods:** How you'll achieve the vision (4-7 approaches, ordered by priority)
+- **Obstacles:** What's in your way (honest, personal, specific)
+- **Metrics:** How you'll measure success (objective, not vanity metrics)
+
+### Phase 2: Create the document structure
+
+1. Create file: `docs/[project-name]-v2mom.org` or appropriate location.
+2. Add metadata: `#+TITLE`, `#+AUTHOR`, `#+DATE`, `#+FILETAGS`.
+3. Create section headings for all 5 components.
+4. Add a "What is V2MOM?" overview section at top.
+
+Save incrementally after each section — V2MOM discussions can run long.
+
+### Phase 3: Define the Vision
+
+**Ask:** "What do you want to achieve? What does success look like?"
+
+**Goal:** Clear, aspirational picture. 1-3 paragraphs describing the end state.
+
+**Your role:**
+- Help articulate what's described
+- Push for specificity ("works great" → what specifically works?)
+- Identify scope (what's included, what's explicitly out)
+- Capture concrete examples the user mentions
+
+**Good vision characteristics:**
+- Paints a picture you can visualize
+- Describes outcomes, not implementation
+- Aspirational but grounded in reality
+- Specific enough to know what's included
+
+**Examples across domains:**
+- Health: "Wake up with energy, complete a 5K without stopping, feel strong in daily activities, stable mood throughout the day."
+- Finance: "Six months emergency fund, debt-free except mortgage, automatic retirement savings, financial decisions that don't cause anxiety."
+- Software: "A package that integrates seamlessly, has comprehensive documentation, handles edge cases gracefully, that maintainers of other packages want to depend on."
+
+**Time estimate:** 15-30 minutes if mostly clear; 45-60 minutes if it needs exploration.
+
+### Phase 4: Define the Values
+
+**Ask:** "What principles guide your decisions? When faced with A vs B, what values help you decide?"
+
+**Goal:** 2-4 values with concrete definitions and examples.
+
+**Your role:**
+- Suggest values based on vision discussion
+- Push for concrete definitions (not just the word, but what it MEANS)
+- Help distinguish between overlapping values
+- Identify when examples contradict stated values
+
+**Common pitfall:** Listing generic words without defining them.
+- Bad: "Quality, Speed, Innovation"
+- Good: "Sustainable means can maintain this for 10+ years without burning out. No crash diets, no 80-hour weeks, no technical debt I can't service."
+
+**For each value, capture:**
+1. The value name (1-2 words)
+2. Definition (what it means in context of this project)
+3. Concrete examples (how it manifests)
+4. What breaks this value (anti-patterns)
+
+**Method:** Start with 3-5 candidates. For each, ask "what does [value] mean to you in this context?" Discuss until the definition is concrete. Refine, merge, remove until 2-4 remain.
+
+**Examples:**
+- Health: "Sustainable: Can do this at 80 years old. No extreme diets. Focus on habits that compound over decades."
+- Finance: "Automatic: Set up once, runs forever. Don't rely on willpower for recurring decisions."
+- Software: "Boring: Use proven patterns. No clever code. Maintainable by intermediate developers. Boring is reliable."
+
+**Time estimate:** 30-45 minutes.
+
+### Phase 5: Define the Methods
+
+**Ask:** "How will you achieve the vision? What approaches will you take?"
+
+**Goal:** 4-7 methods (concrete approaches) ordered by priority.
+
+**Your role:**
+- Extract methods from vision and values discussion
+- Help order by priority (what must happen first?)
+- Ensure methods are actionable (not just categories)
+- Push for concrete actions under each method
+- Watch for method ordering that creates dependencies
+
+**Structure for each method:**
+1. Method name (verb phrase: "Build X", "Eliminate Y", "Establish Z")
+2. Aspirational description (1-2 sentences: why this matters)
+
+**Method ordering matters:**
+- Method 1 should be highest priority (blocking everything else)
+- Lower-numbered methods should enable higher-numbered ones
+- Common ordering patterns:
+  - Fix → Stabilize → Build → Enhance → Sustain
+  - Eliminate → Replace → Optimize → Automate → Maintain
+  - Learn → Practice → Apply → Teach → Systematize
+
+**Examples:**
+
+Health:
+- Method 1: Eliminate Daily Energy Drains (fix sleep, reduce inflammatory foods, address deficiencies)
+- Method 2: Build Baseline Strength (3x/week resistance, progressive overload, compound movements)
+- Method 3: Establish Sustainable Nutrition (meal prep, protein targets, vegetable servings)
+
+Finance:
+- Method 1: Stop the Bleeding (eliminate wasteful subscriptions, high-interest debt, impulse purchases)
+- Method 2: Build the Safety Net (automate savings, reach $1000 fund, then 3 months expenses)
+- Method 3: Invest for the Future (max 401k match, open IRA, automatic contributions)
+
+Software Package:
+- Method 1: Nail the Core Use Case (solve one problem extremely well, clear docs, handle errors gracefully)
+- Method 2: Ensure Quality and Stability (comprehensive tests, CI/CD, semantic versioning)
+- Method 3: Build Community and Documentation (contribution guide, examples, responsive to issues)
+
+**Ordering is flexible until it isn't:** After defining all methods, you may realize the ordering is wrong. Swap them. The order represents priority — getting it right matters more than preserving the initial draft.
+
+**Time estimate:** 45-90 minutes (longest section).
+
+### Phase 5.5: Brainstorm tasks for each method
+
+For each method, brainstorm what's missing to achieve it.
+
+**Ask:** "What else would help achieve this method's goal?"
+
+**Your role:**
+- Suggest additional tasks based on the method's aspirational description
+- Consider edge cases and error scenarios
+- Identify automation opportunities
+- Propose monitoring/visibility improvements
+- Challenge if the list feels incomplete (can't reach the goal)
+- Challenge if the list feels bloated (items don't contribute to the goal)
+- Create sub-tasks for items with multiple steps
+- Ensure priorities reflect contribution to the method's goal
+
+**For each brainstormed task:**
+- Describe what it does and why it matters
+- Assign priority based on contribution to the method
+- Add technical details if known
+- Get user agreement before adding
+
+**Priority system (org-mode):**
+- `[#A]` Critical blockers — must be done first, blocks everything else
+- `[#B]` High-impact reliability — directly enables the method goal
+- `[#C]` Quality improvements — valuable but not blocking
+- `[#D]` Nice-to-have — low priority, can defer
+
+**Time estimate:** 10-15 minutes per method (~50-75 min for 5 methods).
+
+### Phase 6: Identify the Obstacles
+
+**Ask:** "What's in your way? What makes this hard?"
+
+**Goal:** Honest, specific obstacles — both personal and technical/external.
+
+**Your role:**
+- Encourage honesty (obstacles are reality, not failures)
+- Help distinguish symptoms from root causes
+- Identify patterns in behavior that create obstacles
+- Acknowledge challenges without judgment
+
+**Good obstacle characteristics:**
+- Honest about personal patterns
+- Specific, not generic
+- Acknowledges both internal and external obstacles
+- States real stakes (not just "might happen")
+
+**Common obstacle categories:**
+- Personal: perfectionism, hard to say no, gets bored, procrastinates
+- Knowledge: missing skills, unclear how to proceed, need to learn
+- External: limited time, limited budget, competing priorities
+- Systemic: environmental constraints, missing tools, dependencies on others
+
+**For each obstacle:** name it clearly, describe how it manifests in this project, acknowledge the stakes (what happens because of it).
+
+**Examples:**
+
+Health:
+- "I get excited about new workout programs and switch before seeing results (pattern: 6 weeks into a program)"
+- "Social events involve food and alcohol — saying no feels awkward and isolating"
+- "When stressed at work, I skip workouts and eat convenient junk food"
+
+Finance:
+- "Viewing budget as restriction rather than freedom — triggers rebellion spending"
+- "FOMO on lifestyle experiences my peers have"
+- "Limited financial literacy — don't understand investing beyond 'put money in account'"
+
+Software:
+- "Perfectionism delays releases — always 'one more feature' before v1.0"
+- "Maintaining documentation feels boring compared to writing features"
+- "Limited time (2-4 hours/week) and competing projects"
+
+**Time estimate:** 15-30 minutes.
+
+### Phase 7: Define the Metrics
+
+**Ask:** "How will you measure success? What numbers tell you if this is working?"
+
+**Goal:** 5-10 metrics — objective, measurable, aligned with vision/values/methods.
+
+**Your role:**
+- Suggest metrics based on vision, values, methods
+- Push for measurable numbers (not "better" — concrete targets)
+- Identify vanity metrics (look good but don't measure real progress)
+- Ensure metrics align with values and methods
+
+**Metric categories:**
+- **Performance** — measurable outcomes of the work
+- **Discipline** — process adherence, consistency, focus
+- **Quality** — standards maintained, sustainability indicators
+
+**Good metric characteristics:**
+- Objective (not subjective opinion)
+- Measurable (can actually collect the data)
+- Actionable (can change behavior to improve it)
+- Aligned with values and methods
+
+**For each metric, capture:** name, current state (if known), target state, how to measure, measurement frequency.
+
+**Examples:**
+
+Health:
+- Resting heart rate: 70 bpm → 60 bpm (daily via fitness tracker)
+- Workout consistency: 3x/week strength training for 12 consecutive weeks
+- Sleep quality: 7+ hours per night 6+ nights per week (sleep tracker)
+- Energy rating: subjective 1-10 scale, target 7+ weekly average
+
+Finance:
+- Emergency fund: $0 → $6000 (monthly)
+- High-interest debt: $8000 → $0 (monthly)
+- Savings rate: 5% → 20% of gross income (monthly)
+- Financial anxiety: weekly check-in, target "comfortable with financial decisions"
+
+Software:
+- Test coverage: 0% → 80% (coverage tool)
+- Issue response time: median < 48 hours (GitHub stats)
+- Documentation completeness: all public APIs documented with examples
+- Adoption: 10+ GitHub stars, 3+ projects depending on it
+
+**Time estimate:** 20-30 minutes.
+
+### Phase 8 (optional): Migrate existing tasks
+
+If there's an existing `TODO.org` or task list, migrate it under the V2MOM methods.
+
+**Goal:** Consolidate all project tasks under V2MOM methods, eliminate duplicates, move non-fitting items to someday-maybe.
+
+**Process:**
+
+1. **Identify duplicates** — read existing TODO, find tasks already in V2MOM methods, check if V2MOM task has all technical details from the TODO version, enhance if needed, mark original for deletion.
+2. **Map tasks to methods** — for each remaining task, ask "which method does this serve?" Add under appropriate method with priority. Preserve task state (DOING, VERIFY, etc.).
+3. **Review someday-maybe candidates one-by-one** — for each task that doesn't fit methods, ask: keep in V2MOM (which method)? Move to someday-maybe? Delete?
+4. **Final steps** — append someday-maybe items to `docs/someday-maybe.org`; copy completed V2MOM to TODO.org (overwriting). V2MOM becomes the single source of truth.
+
+**Keep in V2MOM:** DOING tasks (work in progress), VERIFY tasks (need testing/verification), tasks that enable method goals.
+
+**Move to someday-maybe:** Doesn't directly serve a method's goal; nice-to-have without clear benefit; research task without actionable outcome; architectural change decided not to pursue; unrelated personal task.
+
+**Delete entirely:** Obsolete tasks (feature removed, problem solved elsewhere); duplicate of something done; task that no longer makes sense.
+
+**Review one task at a time** — don't batch. Capture reasoning.
+
+**Time estimate:** Variable — small (~20 tasks) 30-45 min; medium (~50) 60-90 min; large (100+) 2-3 hours.
+
+This phase is optional — only needed if an existing todo list has substantial content.
+
+### Phase 9: Review and refine
+
+Once all sections are complete, review the whole V2MOM together:
+
+1. **Does the vision excite you?** (If not, why not? What's missing?)
+2. **Do the values guide decisions?** (Can you use them to say no to things?)
+3. **Are the methods ordered by priority?** (Is Method 1 truly most important?)
+4. **Are the obstacles honest?** (Or are you sugar-coating?)
+5. **Will the metrics tell you if you're succeeding?** (Or are they vanity metrics?)
+6. **Does this V2MOM make you want to DO THE WORK?** (If not, something is wrong.)
+
+**Refinement:** merge overlapping methods; reorder methods if priorities are wrong; add missing concrete actions; strengthen weak definitions; remove fluff.
+
+**Red flags:**
+- Vision doesn't excite you → Need to dig deeper into what you really want
+- Values are generic → Need concrete definitions and examples
+- Methods have no concrete actions → Too vague, need specifics
+- Obstacles are all external → Need honesty about personal patterns
+- Metrics are subjective → Need objective measurements
+
+### Phase 10: Commit and use
+
+1. Save the document in its appropriate location.
+2. Share with stakeholders (if applicable).
+3. Use it immediately — start Method 1 execution or the first triage.
+4. Schedule first review (1 week out): is this working?
+
+Use immediately to validate the V2MOM is practical, not theoretical. Execution reveals gaps that discussion misses.
+
+## Principles
+
+### Honesty over aspiration
+
+V2MOM requires brutal honesty, especially in Obstacles.
+
+- "I get bored after 6 weeks" (honest) vs "Maintaining focus is challenging" (bland)
+- "I have 3 hours per week max" (honest) vs "Time is limited" (vague)
+- "I impulse-spend when stressed" (honest) vs "Budget adherence needs work" (passive)
+
+**Honesty enables solutions.** If you can't name the obstacle, you can't overcome it.
+
+### Concrete over abstract
+
+Every section should have concrete examples and definitions.
+
+**Bad:** Vision "be successful" · Values "Quality, Speed, Innovation" · Methods "improve things" · Metrics "do better"
+
+**Good:** Vision "Complete a 5K in under 30 min, have energy to play with kids after work, sleep 7+ hours consistently" · Values "Sustainable: can maintain for 10+ years, no crash diets, no injury-risking overtraining" · Methods "Method 1: Fix sleep quality (blackout curtains, consistent bedtime, no screens 1hr before bed)" · Metrics "5K time: current 38 min → target 29 min (measure: monthly timed run)"
+
+### Priority ordering is strategic
+
+Method ordering determines what happens first. Get it wrong and you'll waste effort.
+
+Common patterns:
+- **Fix → Build → Enhance → Sustain** (eliminate problems before building)
+- **Eliminate → Replace → Optimize** (stop damage before improving)
+- **Learn → Practice → Apply → Teach** (build skill progressively)
+
+Method 1 must address the real blocker — if the foundation is broken, nothing built on it will hold; high-impact quick wins build momentum; must stop the bleeding before rehab.
+
+### Methods need concrete actions
+
+If you can't list 3-8 concrete actions for a method, it's too vague.
+
+**Test:** Can you start working on Method 1 immediately after completing the V2MOM? If the answer is "I need to think about what to do first," the method needs more concrete actions.
+
+- Too vague: "Method 1: Improve health"
+- Concrete: "Method 1: Fix sleep quality → blackout curtains, consistent 10pm bedtime, no screens after 9pm, magnesium supplement, sleep tracking"
+
+### Metrics must be measurable
+
+"Better" is not a metric. "Bench press 135 lbs" is a metric.
+
+For each metric, you must be able to answer:
+1. How do I measure this? (exact method or tool)
+2. What's the current state?
+3. What's the target state?
+4. How often do I measure it?
+5. What does this metric actually tell me?
+
+If you can't answer these, it's not a metric yet.
+
+### V2MOM is a living document
+
+Not set in stone. As you execute, expect: method reordering (new info reveals priorities), metric adjustments (too aggressive or too conservative), new obstacles emerging, refined value definitions.
+
+**Update when:** major priority shift occurs; new obstacle emerges that changes approach; metric targets prove unrealistic or too easy; method completion opens new possibilities; quarterly review reveals misalignment.
+
+**But not frivolously:** Changing the V2MOM every week defeats the purpose. Update on major shifts, not minor tactics.
+
+### Use it or lose it
+
+V2MOM only works if you use it for decisions.
+
+Use it for:
+- Weekly reviews (am I working on the right things?)
+- Priority decisions (which method does this serve?)
+- Saying no to distractions (not in the methods)
+- Celebrating wins (shipped Method 1 items)
+- Identifying blockers (obstacles getting worse?)
+
+If 2 weeks pass without referencing the V2MOM, something is wrong — either the V2MOM isn't serving you, or you're not using it.
+
+## Closing Test
+
+Can you say "no" to something you would have said "yes" to before? If so, the V2MOM is working.
diff --git a/.claude/commands/finish-branch.md b/.claude/commands/finish-branch.md
new file mode 100644
index 0000000..0903010
--- /dev/null
+++ b/.claude/commands/finish-branch.md
@@ -0,0 +1,251 @@
+---
+name: finish-branch
+description: Complete a feature branch with a forced-choice menu of outcomes (merge locally / push + open PR / keep as-is / discard). Runs verification before offering options (tests, lint, typecheck per the project's conventions — delegates to `verification.md`). Requires typed confirmation for destructive deletion (no accidental work loss). Handles git worktree cleanup correctly: tears down for merge and discard, preserves for keep and push (where the worktree may still be needed for follow-up review or fixes). References existing rules for commit conventions (`commits.md`), review discipline (`review-code`), and verification (`verification.md`) — this skill is the workflow scaffold, not a re-teach of the underlying standards. Use when implementation is complete and you need to wrap up a branch. Do NOT use for mid-development merges (that's normal git flow), for the wrap-up *of a whole session* (different scope — session-end is narrative + handoff, not branch integration), for creating a new branch (no skill for that — just `git checkout -b`), or when review hasn't happened yet (run `/review-code` first, then this).
+---
+
+# /finish-branch
+
+Complete a development branch cleanly. Verify, pick one of four outcomes, execute, clean up.
+
+## When to Use
+
+- Implementation is done, tests are written, you believe the branch is ready to move forward
+- You're about to ask "okay, now what?" — this skill exists to stop you asking and start you choosing
+- You need to integrate, preserve, or discard a branch's work deliberately
+
+## When NOT to Use
+
+- Mid-development merges — that's regular git flow, not "finishing" a branch
+- Full session wrap-up (closing the day's work, recording context for next time) — different scope, not about branch integration
+- Creating a new branch — just `git checkout -b <name>`
+- Before review has happened on a significant change — run `/review-code` first; this skill assumes the work has been judged ready
+
+## Phase 1 — Verify
+
+Before offering any option, run the project's verification. Delegate to `verification.md` discipline:
+
+- Tests pass (full suite, not just the last one you wrote)
+- Linter clean (no new warnings)
+- Type checker clean (no new errors)
+- Staged diff matches intent (no accidental additions)
+
+Commands vary by stack. Use what the project's Makefile, `package.json` scripts, or convention dictates. Examples:
+
+- JavaScript / TypeScript: `npm test && npm run lint && npx tsc --noEmit`
+- Python: `pytest && ruff check . && pyright`
+- Go: `go test ./... && go vet ./... && golangci-lint run`
+- Elisp: `make test && make validate-parens && make validate-modules`
+
+**If verification fails:** stop. Report the failures with file:line, do not offer the outcome menu. The branch isn't finished — fix the failures and re-invoke `/finish-branch`.
+
+**If verification passes:** continue to Phase 2.
+
+If any verification step fails and triggers sub-investigations, follow `subagents.md` — dispatch a fresh fix-agent per failure with pasted error context rather than retrying in the orchestrator.
+
+## Phase 2 — Determine Base Branch
+
+The four outcomes need to know what the branch is being integrated into. Find the base:
+
+```bash
+git merge-base HEAD main 2>/dev/null \
+  || git merge-base HEAD master 2>/dev/null \
+  || git merge-base HEAD develop 2>/dev/null
+```
+
+If none resolves, ask: "I couldn't find a merge base against `main`, `master`, or `develop`. What's the base branch for this work?" Don't guess.
+
+Also collect:
+- Current branch name: `git branch --show-current`
+- Commit range to integrate: `git log <base>..HEAD --oneline`
+- Unpushed commits on the remote: `git log @{u}..HEAD` if the branch tracks a remote, otherwise all commits are local
+- Whether the current directory is in a git worktree: `git worktree list | grep $(git branch --show-current)`
+
+These details feed the outcome prompt and the execution phases.
+
+## Phase 3 — Offer the Menu
+
+Present exactly these four options. Don't editorialize. Don't add a "recommendation." The user picks.
+
+```
+Branch <branch-name> is ready. It has N commits on top of <base-branch>.
+
+What would you like to do?
+
+1. Merge locally into <base-branch>     (integrate now, delete the branch)
+2. Push and open a Pull Request          (remote integration flow)
+3. Keep as-is                             (preserve branch and worktree for later)
+4. Discard                                (delete branch and commits — requires confirmation)
+
+Pick a number.
+```
+
+If the branch already has a remote and the user chose 1 (merge locally), note: "The branch is pushed to `origin/<branch-name>`. After merging locally, do you also want to delete the remote branch?" Ask; don't assume.
+
+**Stop and wait for the answer.** Don't guess, don't infer from context, don't proceed to Phase 4 until the user picks.
+
+## Phase 4 — Execute the Chosen Outcome
+
+### Option 1 — Merge Locally
+
+```bash
+git checkout <base-branch>
+git pull                              # ensure base is current
+git merge --no-ff <branch-name>       # --no-ff preserves the branch point in history
+```
+
+**Re-verify the merged result.** Run tests / lint / type check on the merged state — the merge may have integrated cleanly at the text level while breaking semantically.
+
+If verification passes:
+```bash
+git branch -d <branch-name>           # safe delete (refuses unmerged branches, which this one is merged)
+```
+
+If the branch had a remote:
+- If user confirmed removing the remote: `git push origin --delete <branch-name>`
+- Otherwise: leave the remote branch, note that the user should clean it up manually
+
+Clean up the worktree (Phase 5).
+
+### Option 2 — Push and Open a Pull Request
+
+```bash
+git push -u origin <branch-name>      # -u sets upstream on first push
+```
+
+Open the PR. Use the project's `gh` CLI (install via `deps` target if missing):
+
+```bash
+gh pr create \
+  --base <base-branch> \
+  --title "<subject from the most recent commit, or user-provided>" \
+  --body "$(cat <<'EOF'
+## Summary
+
+<two or three bullets summarizing what changed, pulled from the commit range>
+
+## Test Plan
+
+- [ ] <steps the reviewer should take to verify>
+EOF
+)"
+```
+
+**Commit message and PR body discipline:** no AI attribution, no "🤖 Generated with" footer, conventional message style — see `commits.md`. If the project has a `.github/pull_request_template.md`, use it instead of the template above.
+
+**Do NOT clean up the worktree.** The branch is not yet merged; you may need the worktree for reviewer feedback, fixes, or rebase. (Phase 5 table.)
+
+### Option 3 — Keep As-Is
+
+No git state changes. Report:
+
+```
+Keeping branch <branch-name> as-is.
+Worktree preserved at <worktree-path> (or "same working directory" if not a worktree).
+Resume later with `git checkout <branch-name>` or re-invoke `/finish-branch`.
+```
+
+**Do NOT clean up the worktree.** The user explicitly wants to come back.
+
+### Option 4 — Discard
+
+**Confirmation gate — required.** Write out what will be permanently lost:
+
+```
+Discarding will permanently delete:
+
+- Branch: <branch-name>
+- Commits that exist only on this branch (N commits):
+    <list, abbreviated if very long>
+- Worktree at <worktree-path> (if applicable)
+- Remote branch origin/<branch-name> (if it exists)
+
+This cannot be undone via `git checkout` — only via the reflog (≤30 days by default).
+
+To proceed, type exactly: discard
+```
+
+**Wait for the user to type the literal word `discard`.** Anything else — "yes," "y," "confirm," a number — does not qualify. Re-prompt.
+
+If confirmed:
+
+```bash
+git checkout <base-branch>
+git branch -D <branch-name>                           # force delete
+git push origin --delete <branch-name> 2>/dev/null    # delete remote if it exists; ignore error if not
+```
+
+Clean up the worktree (Phase 5).
+
+## Phase 5 — Worktree Cleanup
+
+| Option | Cleanup worktree? |
+|---|---|
+| 1. Merge locally | **Yes** |
+| 2. Push + PR | **No** (may still be needed for review feedback) |
+| 3. Keep as-is | **No** (user explicitly wants it) |
+| 4. Discard | **Yes** |
+
+**If cleanup applies:**
+
+```bash
+git worktree list                                     # confirm you're in a worktree
+git worktree remove <worktree-path>                   # non-destructive if clean
+```
+
+If `git worktree remove` refuses (unclean state somehow): surface the reason to the user. Don't force removal without their consent — a dirty worktree may contain work they intended to rescue.
+
+**If no cleanup:** done. Report final state.
+
+## Output
+
+Short final report, not a celebration:
+
+```
+Branch <branch-name>:
+  - Outcome: <1 | 2 | 3 | 4>
+  - <specific state change, e.g. "merged into main; branch deleted; worktree removed">
+  - Next: <what the user would do next — e.g. "await PR review", "resume work", "start a new branch">
+```
+
+No emojis. No "🎉 all done!" No AI attribution. See `commits.md`.
+
+## Critical Rules
+
+**DO:**
+- Run verification before offering the outcome menu. No exceptions.
+- Present exactly four options, clearly labeled. The forced choice is the point.
+- Require the literal word `discard` for Option 4.
+- Re-verify after a merge (Option 1) — merges can integrate textually while breaking semantically.
+- Clean up worktrees only for Options 1 and 4.
+
+**DON'T:**
+- Offer options with failing verification — the branch isn't finished.
+- Editorialize the menu ("you should probably do option 2"). The user picks.
+- Accept "y" or "yes" for the discard gate. Literal word `discard`.
+- Clean up worktrees after Option 2 or 3 — the user needs them.
+- Add AI attribution to commit messages, PR descriptions, or output.
+
+## Common Mistakes This Prevents
+
+- **Open-ended "what now?" at the end of work.** Natural but corrosive — the user either has to improvise the workflow or restate their preference each time. The four-option menu ends the improvisation.
+- **Accidental destructive deletes.** "Discard this branch?" → "y" → 3 days of work gone. The typed-word gate turns one muscle-memory keystroke into a deliberate action.
+- **Merge-then-oops.** Text-level merge completes; semantic integration is broken; the user didn't notice because they didn't re-run the tests on the merged result. Phase 4 Option 1 re-verifies.
+- **Worktree amnesia.** Cleaning up a worktree after Option 2 (push + PR) means losing local state just when the reviewer asks for a fix. The cleanup matrix keeps the worktree exactly when it's still needed.
+- **"Generated by Claude Code" trailing into a PR.** The no-attribution rule from `commits.md` applies here too — the PR body is committed content under the project's identity, not Claude's.
+
+## Integration with Other Skills
+
+**Before this skill:**
+- `/review-code` — run a review on the branch before finishing it for significant changes
+- `/arch-evaluate` — if the branch touches architecture, audit against `.architecture/brief.md`
+
+**After this skill:**
+- If Option 2 (PR opened): reviewer feedback comes in → address → `/finish-branch` again on updated state
+- If Option 1 (merged locally): branch is done; if this closes a ticket / ADR, update tracking
+- If Option 3 (kept): resume later; re-invoke when ready
+- If Option 4 (discarded): often paired with `/brainstorm` or `/arch-design` to retry the problem differently
+
+**Companion rules (not skills) this skill defers to rather than re-teaching:**
+- Verification discipline → `verification.md`
+- Commit message conventions + no AI attribution → `commits.md`
+- Review discipline (for anything pre-merge) → `review-code`
diff --git a/.claude/commands/prompt-engineering.md b/.claude/commands/prompt-engineering.md
new file mode 100644
index 0000000..d566286
--- /dev/null
+++ b/.claude/commands/prompt-engineering.md
@@ -0,0 +1,228 @@
+---
+name: prompt-engineering
+description: Craft prompts (commands, hooks, skill descriptions, sub-agent instructions, system prompts, one-shot requests to other LLMs) that do what they're meant to and resist common failure modes. Covers four moves that determine whether a prompt holds up: classify the prompt type (discipline-enforcing / guidance / collaborative / reference) to pick the right tone and techniques; apply the persuasion framework appropriate to that type (seven principles from Meincke et al. 2025, including which to avoid — notably Liking, which breeds sycophancy); match task fragility to degrees of freedom (high/medium/low); and spend the context window like a shared resource. Also contains a brief reference for classical techniques (few-shot, chain-of-thought, system prompts, templates). Use both in design mode (asking for help writing a new prompt from scratch) and critique mode (paste a draft, get it rewritten to resist common failure modes). Do NOT use for prose editing unrelated to LLM prompts (use a writing skill), for implementing application code that uses an LLM (different scope), or for content moderation / prompt-injection defense (adjacent but separate domain).
+---
+
+# Prompt Engineering
+
+Every command, hook, skill description, sub-agent instruction, and system prompt is a prompt. This skill helps make them resist the failures that chew through engineering time: sycophantic sub-agents, vague rules agents rationalize around, verbose instructions that cost tokens without changing behavior, trigger descriptions that don't actually trigger.
+
+Two modes of use:
+
+- **Design mode** — you need a new prompt. Ask what you're building, and this skill walks through type-classification and technique selection.
+- **Critique mode** — you have a draft. Paste it, and this skill identifies the prompt type, flags failure risks, and rewrites.
+
+## When to Use
+
+- Writing a new Claude Code command, hook, skill SKILL.md, or sub-agent instruction
+- Writing a system prompt for any LLM interaction
+- Debugging a prompt that's misbehaving (sub-agent drifts, skill under-triggers, rule gets rationalized around)
+- Reviewing someone else's prompt before it ships
+
+## When NOT to Use
+
+- General prose editing unrelated to LLM prompts (use a writing skill)
+- Implementation code that calls an LLM (different scope — covered by language skills)
+- Prompt-injection defense / content moderation (adjacent; separate discipline)
+- Casual single-turn requests where a prompt's behavior isn't production-critical
+
+## The First Move: Classify the Prompt Type
+
+Prompts fail in different ways depending on what they're for. Before writing technique, name the type. Four categories:
+
+### 1. Discipline-enforcing
+
+**What it is:** The prompt must produce the *same* behavior every time, under any condition. No judgment calls, no rationalization.
+
+**When you have one:** Security checks, deployment gates, safety rules, pre-commit hooks, FedRAMP-style compliance verifiers, PII scanners, rate-limit enforcers.
+
+**Failure mode:** The model finds reasons to deviate ("but in this case, maybe..."). Soft verbs like "should try" or "generally" are rationalization fuel.
+
+**Success criteria:** Identical output on edge cases. The model never "makes an exception."
+
+### 2. Guidance / technique
+
+**What it is:** The prompt teaches a technique and expects the model to apply it with judgment. Outcome varies with context.
+
+**When you have one:** Refactoring patterns, debugging workflows, test-writing guidance, architecture advice, code review checklists.
+
+**Failure mode:** Either over-constrained (turns judgment into cargo-culting) or under-constrained (model freelances and misses the pattern).
+
+**Success criteria:** Technique applied correctly across varied inputs; model can explain why it deviated when it did.
+
+### 3. Collaborative
+
+**What it is:** Peer-level work. Review, brainstorm, pair-editing, technical feedback. The user wants an honest second opinion, not a cheerleader.
+
+**When you have one:** Code review agents, design-review agents, editing help, technical critique sub-agents, brainstorming partners.
+
+**Failure mode:** **Sycophancy.** Soft-pedaling real issues because the default LLM tone is agreeable. This is the Liking principle firing by default, and for collaborative prompts it's usually *bad*.
+
+**Success criteria:** Honest assessment. Willing to say "this is wrong" or "there are no issues." Structured output to prevent hand-waving.
+
+### 4. Reference
+
+**What it is:** Metadata, index content, trigger descriptions, glossaries, API references. The prompt's job is to be findable and clear, not to change behavior.
+
+**When you have one:** Skill frontmatter descriptions, API endpoint docs, keyword lists for search, glossary entries.
+
+**Failure mode:** Under-triggering (skill never fires) or ambiguity (reader can't tell scope).
+
+**Success criteria:** Triggers reliably on the intended phrases; explicit about what's in and out of scope.
+
+## The Persuasion Framework
+
+LLMs are parahuman — they were trained on human text that's full of persuasion signals, and they respond to those same signals. Use this deliberately, not unconsciously.
+
+### The Seven Principles
+
+Adapted from Meincke et al., *Persuasion and Compliance in Large Language Models* (2025, N≈28,000 conversations). Two-principle combinations shifted compliance rates from ~33% to ~72%.
+
+- **Authority** — Non-negotiable framing. "YOU MUST", "Never", "Always", "No exceptions."
+- **Commitment** — Force explicit action. "Announce the rule you're applying before applying it." "Output your checklist with each item checked."
+- **Scarcity** — Time-bound or sequence-bound. "Before proceeding." "Immediately after X." "First, then."
+- **Social Proof** — Universal pattern language. "Every time." "The failure mode is always…". "Skipping this step = bug in production."
+- **Unity** — Collaborative framing. "We're colleagues." "Our codebase." "Let's work through this."
+- **Reciprocity** — Exchanges of effort. Rarely needed for prompts; often reads as manipulative.
+- **Liking** — Warmth and agreement. **Creates sycophancy.** Usually actively harmful for work where you need honest assessment.
+
+### Which Principles By Prompt Type
+
+| Prompt Type | Use | Avoid |
+|---|---|---|
+| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
+| Guidance | Moderate Authority + Unity | Heavy Authority (feels dogmatic); Liking |
+| Collaborative | Unity + Commitment | Authority (hostile), **Liking (sycophancy)** |
+| Reference | Clarity alone — minimize persuasion | All persuasion framings |
+
+### Why This Matters
+
+- **Bright-line rules reduce rationalization.** "Never commit without running tests" leaves no exception space. "Try to run tests before committing" invites "well, this is just a typo fix, surely…"
+- **Implementation intentions create automatic behavior.** "When you see X, do Y" beats "generally, do Y where appropriate."
+- **Liking is the default LLM tone.** If you don't actively counteract it on collaborative prompts, you'll get agreement bias.
+
+## Degrees of Freedom
+
+Match prompt specificity to how much the task tolerates variation.
+
+- **Low freedom** — exact script, exact format, no deviation. Use when consistency is the whole point: deployment commands, migration scripts, checksum comparisons. "Run `python scripts/migrate.py --verify --backup`" not "run the migration script."
+- **Medium freedom** — parameterized pattern or pseudocode skeleton. The model fills in pieces within a named structure. Most common. Use for most skills and commands.
+- **High freedom** — open instructions, multiple valid approaches. Use when context drives the answer and you trust the model to navigate. Code review, brainstorming, exploratory debugging.
+
+Mismatches to avoid:
+
+- High-freedom prompt for a low-tolerance task (deployment, security, crypto): the model improvises; production breaks
+- Low-freedom prompt for an open task (code review, design): the model produces identical output regardless of context; signal is lost
+
+## Context Window Economics
+
+The context window is a shared resource. Every token spent on verbose instructions is a token not available for the actual work or for other loaded context.
+
+Questions to ask of each sentence in a prompt:
+
+- Does the model need this? Or can it be assumed?
+- Is this stable enough to go in the system prompt, freeing up the user-message budget?
+- Does this paragraph justify its token cost, or is it restating what a nearby paragraph already said?
+- Would a short example convey what three sentences of description do?
+
+**Progressive disclosure for skills specifically:** Skill metadata (name + description) is always in context; the SKILL.md body loads only when the skill triggers; referenced files load only when needed. Put triggering content in the frontmatter, procedural content in the body, exhaustive reference material in linked files.
+
+## Classical Techniques — Brief Reference
+
+Covered extensively elsewhere (Anthropic docs, OpenAI cookbook, the prompt-engineering literature). Short pointers:
+
+- **Few-shot learning** — Include 2-5 input/output examples rather than describing the pattern in prose. Best for consistent formatting, edge case handling, subtle behaviors that are easier to show than tell.
+- **Chain-of-thought** — Request step-by-step reasoning before the answer. Add "think through this step by step" for zero-shot; include a worked reasoning trace for few-shot. Best for multi-step logic; significant accuracy gains on analytical tasks.
+- **System prompts** — For stable role, constraints, and output format that persist across turns. Keep user messages for the varying task content.
+- **Templates** — Parameterize prompts with variables for reuse. Version-control them as code.
+
+These are well-documented; don't re-teach here. Lean toward the four moves (type, persuasion, freedom, economics) — those are what most prompt work actually needs.
+
+## Design Mode Workflow
+
+When asked to design a prompt from scratch:
+
+1. **Classify the type.** Ask the user to pick one, explaining each:
+
+   > - **Discipline-enforcing** — must behave identically every time (security, deployment, safety)
+   > - **Guidance** — teaches a technique the model applies with judgment (refactoring, debugging)
+   > - **Collaborative** — peer work; honest assessment matters more than agreement (review, critique)
+   > - **Reference** — metadata, index, trigger description (skill frontmatter, glossary)
+
+2. **Confirm success criteria.** How will the user know the prompt works? One measurable sentence.
+
+3. **Pick persuasion principles** from the by-type matrix. Name them to the user so the framing is deliberate.
+
+4. **Match degrees of freedom** to task fragility. Explain the choice.
+
+5. **Draft and review.** Write the prompt; walk the user through the choices. Short draft beats long. Bold imperative beats hedging.
+
+6. **Apply the ethics test** (see below). If the prompt would embarrass the user if its recipient understood the techniques being used, revise.
+
+## Critique Mode Workflow
+
+When handed an existing draft:
+
+1. **Classify what type it actually is** — often different from what the user thinks it is. A "review" agent phrased in soft Liking language is a collaborative prompt failing at sycophancy.
+2. **Audit for principle mismatches.** Is a collaborative prompt using Authority? Is a discipline prompt hedging? Is Liking sneaking into a review prompt?
+3. **Check degrees of freedom** against task fragility.
+4. **Check token economy.** Redundancy, restated instructions, unnecessary preamble.
+5. **Check for footguns** from the anti-patterns list.
+6. **Rewrite** — show before/after. Name the changes.
+
+## Ethics Test
+
+Use persuasion to make prompts reliable, not to manipulate. Before shipping any prompt that uses strong Authority, Commitment, or Social Proof framing:
+
+> If the person being persuaded (the LLM, yes, but also any human user downstream of its output) fully understood the techniques in use, would they still consent to the behavior being requested?
+
+If the answer is "no" or "I'd rather they didn't think about it" — revise. Legitimate uses: preventing security slips, enforcing reliability, protecting the user's genuine interests. Illegitimate: extracting consent, bypassing reasonable caution, guilt-inducing framing to force output.
+
+Don't confuse discipline with manipulation. Discipline serves the user; manipulation serves the prompt author's convenience at the user's expense.
+
+## Anti-Patterns
+
+- **Liking by default on a collaborative prompt.** Soft, agreeable language in a review or critique agent produces "looks good to me" theatre. Use Unity + Commitment; strip Liking explicitly.
+- **Hedging on a discipline-enforcing prompt.** "Please try to" and "generally" are loopholes. If the rule is absolute, use absolute language.
+- **Authority stack on a guidance prompt.** All-caps MUSTs on a technique skill feel dogmatic and cargo-cult; the model obeys surface form while missing the idea.
+- **High freedom on a fragile task.** "Deploy the service" invites improvisation; "run `deploy.sh --env prod --verify`" doesn't.
+- **Restating the instruction in prose.** If a sentence is repeating what the previous one said, cut one of them.
+- **Example pollution.** 12 few-shot examples when 3 would do. The model generalizes better from fewer good examples than from many with subtle inconsistencies.
+- **Instruction buried below prose.** Put the action up front. "Read this and then do X" gets the action; "Here's some context about Y, and also we want Z, and consider that…" buries it.
+- **Ambiguous scope in reference prompts.** Trigger descriptions without explicit *do-not-trigger* conditions under-fire and mis-fire.
+- **Persuasion without purpose.** Adding "YOU MUST" or "Every time" because it sounds rigorous — if the task tolerates variance, strong framing just hardens the wrong behaviors.
+
+## Review Checklist
+
+Before declaring a prompt done:
+
+- [ ] The type is classified (discipline / guidance / collaborative / reference)
+- [ ] The persuasion principles match the type (and Liking is explicitly excluded for collaborative)
+- [ ] Degrees of freedom match the task's fragility
+- [ ] Each sentence earns its tokens
+- [ ] Trigger phrases and (for reference prompts) explicit negative conditions are present
+- [ ] Sycophancy traps (for collaborative) and rationalization loopholes (for discipline) are closed
+- [ ] The ethics test passes
+
+## Output
+
+Return the drafted or critiqued prompt, with a short note explaining:
+
+1. Prompt type you identified
+2. Persuasion principles applied (and which you explicitly avoided)
+3. Degrees-of-freedom level chosen
+4. One or two things the revision improved
+
+No long explanations. The prompt itself should demonstrate the principles, not narrate them.
+
+## Related Skills
+
+- **`codify`** — after shipping a prompt that worked well, capture the pattern into `CLAUDE.md`
+- **`brainstorm`** — when the *shape* of the prompt isn't clear yet, run brainstorm first to nail down the requirement, then return here for the prompt itself
+- **`arch-decide`** — if a prompt-design choice is significant enough to document (e.g., "we standardize on Authority + Commitment for all deployment prompts"), record it as an ADR
+
+## References
+
+- Meincke, L., et al. (2025). *Persuasion and Compliance in Large Language Models.* N≈28,000 AI conversations; 7 persuasion principles; compliance shifts of ~2x with appropriate combinations.
+- Anthropic prompt engineering guidance — context window as shared resource; progressive disclosure; degrees-of-freedom framing.
+- Classical prompt engineering literature (few-shot, CoT, system prompts) — assumed background; not re-taught here.
diff --git a/.claude/commands/respond-to-cj-comments.md b/.claude/commands/respond-to-cj-comments.md
new file mode 100644
index 0000000..61cf533
--- /dev/null
+++ b/.claude/commands/respond-to-cj-comments.md
@@ -0,0 +1,232 @@
+# /respond-to-cj-comments — Process `cj:` Annotations in a File
+
+Scan a file for `cj:` comments (Craig's in-line instructions and questions) and handle each one with subagent-delegated accuracy. Used to batch-process notes Craig leaves in documents, code, or org files for later action.
+
+## Usage
+
+```
+/respond-to-cj-comments [FILE_PATH]
+```
+
+If no file is given, ask the user for the target file. If the user references "this file" or similar with a visible editor buffer, confirm the path before starting.
+
+## What counts as a `cj:` comment
+
+A `cj:` comment is any line where the text `cj:` (case-insensitive) starts the comment content. The comment marker varies by file type:
+
+| File type                                      | Example                                                      |
+|------------------------------------------------|--------------------------------------------------------------|
+| Org / plain text                               | `cj: what does this section actually deliver?`               |
+| Markdown                                       | `<!-- cj: split this paragraph before the Scope section -->` |
+| Python / shell / YAML                          | `# cj: check whether this still matches reality`             |
+| JavaScript / C / Java / TypeScript / Rust / Go | `// cj: verify the error wrapping logic here`                |
+| Lisp / Elisp / Scheme                          | `;; cj: is this hook still wired?`                           |
+| LaTeX                                          | `% cj: rewrite, this sounds too corporate`                   |
+| HTML / XML                                     | `<!-- cj: move this above the fold -->`                      |
+
+Multi-line comments are supported when continuation lines keep the same comment prefix:
+
+```
+# cj: please check whether the feature flag is still wired.
+# also confirm the fallback path matches the diagram in docs/arch.org.
+```
+
+Treat the whole contiguous block as one `cj:` item.
+
+## Instructions
+
+### 1. Scan the file
+
+Read the target file and collect every `cj:` comment. For each one, record:
+
+- File path and line number (or range for multi-line items)
+- Enclosing context:
+  - *Org file:* the parent `* TODO` / heading path (e.g. `Parent ▸ Subheading ▸ TODO Foo`)
+  - *Code:* the surrounding function or class
+  - *Other:* a few lines of surrounding prose
+- The full comment text, including continuation lines
+- The raw comment-marker style so the line can be located and removed later
+
+### 2. Classify each item
+
+For each `cj:` comment, decide whether it's:
+
+- **Instruction** — a request for action. Signals: imperative verbs (check, rewrite, fix, remove, add, draft), "please do X".
+- **Question** — a request for information. Signals: ends with `?`, starts with who/what/when/where/why/how, "is this", "does this", "should we".
+- **Both** — contains an instruction and a question. Handle the instruction side, answer the question side.
+
+When truly ambiguous, treat as both.
+
+### 3. Delegate to subagents — accuracy over speed
+
+For each non-trivial item, spawn an Agent subagent. Trivial items (e.g. "remove this blank line") the main thread can handle directly without delegation.
+
+Two classes of subagent:
+
+- **Instruction subagent** — given the comment, the enclosing context, and the file path, figures out what change is needed and reports a concrete recommendation. Reports: what should change, file paths + line numbers, any follow-up the user needs to decide.
+- **Question subagent** — researches the question. For questions requiring codebase exploration, ticket lookups, web research, or cross-file reasoning, give the subagent explicit scope (files to read, tickets to check, search terms). Reports: direct answer, evidence (file:line refs, quotes, sources), confidence level, any remaining unknowns.
+
+Run subagents in parallel when the items are independent. Wait for review between batches if items depend on each other.
+
+Prompt every subagent with four required fields (see `subagents.md` for the full contract):
+
+1. **Scope** — one bounded comment, named file, specific action or question
+2. **Context** — paste the comment verbatim plus the surrounding context from step 1
+3. **Constraints** — do not touch other `cj:` comments, do not refactor unrelated code, preserve file formatting
+4. **Output format** — for instructions: a specific change proposal (before/after or file:line patch), plus URLs or file paths for every external source consulted. For questions: answer + evidence (including URLs or file:line refs for every claim) + confidence level + under 300 words. Subagents must cite sources. A claim without a URL or file reference doesn't belong in the report.
+
+The main thread applies edits. Subagents report; they do not write to the source file. This keeps formatting consistent and makes review easier.
+
+**Accuracy over speed is the rule.** Three subagents and fifteen minutes to answer one question correctly beats a fast wrong answer.
+
+### 4. Apply changes
+
+For **instructions**:
+
+1. Review the subagent's proposed change before applying. Check that it matches the comment's intent and doesn't introduce a new issue. If the proposal looks wrong or incomplete, dispatch a fix subagent with the failure report as its context. Don't retry the diagnosis in the main thread (see the Context-Pollution Rule in `subagents.md`). After two failed fix attempts, stop and surface the problem to the user.
+2. Apply the change once the proposal is sound. Edits come from the main thread.
+3. **Org-mode subheader format.** When inserting new content under an existing org-mode task (research findings, drafted messages, log entries, recorded Slack replies), use a timestamped subheader one level deeper than the parent task:
+
+   ```
+   *** 2026-04-23 Thu @ 15:20:52 -0500 <short description>
+   <content>
+   ```
+
+   Use one more `*` than the parent task's heading level. If the parent is `** TODO`, the subheader is `***`. If the parent is `**** TODO`, the subheader is `*****`. Generate the timestamp with `date "+%Y-%m-%d %a @ %H:%M:%S %z"` so it's accurate, not estimated.
+
+4. **Surface URLs.** If the subagent's output includes URLs, file paths, or external references, list them under the updated task content. After applying the edit, offer to convert them into explicit org-mode links (`[[url][label]]` or `[[file:path][label]]`) at the location where the `cj:` comment originated, or elsewhere in the task if that fits better.
+
+5. If the comment is inside an org-mode `TODO` heading, mark that `TODO` as `DOING` when work begins. Leave it `DOING` for the user to advance to `DONE` after review.
+
+6. Remove the `cj:` comment from the file (the entire contiguous block, including continuation lines).
+
+For **questions**:
+
+1. Capture the answer in the summary (step 5).
+2. Remove the `cj:` comment. The user can re-open the thread conversationally if they have follow-ups.
+
+For **writing destined for public channels** (commit messages, PR descriptions, PR comments, Slack or email messages, public docs):
+
+1. Invoke `/humanizer` on the draft.
+2. Apply these voice and attribution rules to the writing you produced (adapted from `/home/cjennings/code/rulesets/claude-rules/commits.md`):
+
+   **No AI attribution, anywhere.** Never include AI, LLM, Claude, or Anthropic attribution in any public-facing text. That means:
+   - No `Co-Authored-By: Claude` (or Claude Code, or any AI) trailers
+   - No "Generated with Claude Code" footers or equivalents
+   - No emojis implying AI authorship
+   - No references to Claude, Anthropic, LLM, or "AI tool" as a credited contributor
+   - Strip any attribution a tool or template inserts by default
+
+   **First person where it fits.** When the subject is Craig or a decision he made, use "I". When the subject is a team decision or shared rationale, "we" fits. Name other authors when crediting their work ("Kostya's PR #116 did X"). Avoid third-person press-release voice ("This PR introduces X", "This change restores Y") unless the code or system is the actor.
+
+   **Brief. Terse is fine.** A one-sentence body beats a paragraph saying the same thing. If the subject line covers it, skip the body entirely. Cut every clause that restates what the diff, the card, or the surrounding context already shows. Length isn't a proxy for care. Rhetorical padding ("worth noting", "it's important to understand") always comes out.
+
+   **Kind.** Text directed at a specific person: acknowledge them when it fits, without pouring it on. Frame disagreement as your read ("I think...", "my read was...", "did you mean X?"). Leave room for the other person to have seen something you didn't. A polite question beats a defensive explanation.
+
+   **Plain English.** Complete sentences a low- or mid-level engineer who isn't a native speaker can read without stumbling. Replace semicolons with periods or commas. Prefer contractions ("it's", "that's", "don't", "we're"). Split long sentences on conjunctions ("so", "and", "but") when meaning survives the split. No em-dashes, use regular dashes instead. Em-dashes break in GitHub, Linear, and Slack.
+
+   **Don't stack jargon.** A sentence that chains three or more type signatures, API names, or compiler concepts reads as a wall. Break it into shorter sentences. Keep the terms a reader will grep for, drop the ones that name compiler internals.
+
+3. Scan the output for AI-attribution tells. If you catch yourself having written any of these, stop, delete, and rewrite:
+   - `Co-Authored-By: Claude`
+   - `Generated with Claude Code` (with or without the robot emoji)
+   - "Created with Claude Code"
+   - "Assisted by AI"
+
+   Rewrite as Craig would write it: concise, focused on the content, with no mention of how the text was produced.
+
+If the writing will be *posted* (not just saved as a draft), follow the Review and Publish flow in `commits.md` — draft to `/tmp/`, open in `emacsclient -n`, wait for Craig's explicit approval before posting.
+
+**Private writing** (todo.org entries, session-context.org, scratch notes, internal rulesets) skips the humanizer pass. Voice rules still apply where relevant, but the overhead of a draft-and-approve cycle is not warranted.
+
+### 5. Report
+
+Produce one summary at the end, structured:
+
+```
+## Summary
+
+### Instructions (N)
+
+1. <file:line> <one-line recap>
+   → done. <brief what-changed>.
+   → TODO status: <DOING | unchanged>.
+2. ...
+
+### Questions (N)
+
+1. <file:line> Q: <comment verbatim, trimmed to one line>
+   A: <direct answer>
+   Evidence: <file:line refs, ticket IDs, URLs, quotes>
+   Confidence: <high | medium | low with caveat>
+2. ...
+
+### Follow-ups
+
+- <anything the user needs to decide, review, or act on>
+
+### Unresolved
+
+- <any `cj:` comments that couldn't be handled — say why, leave them in place>
+
+### File state
+
+State explicitly one of:
+- "File is now clean. All `cj:` comments resolved and removed."
+- "File still contains N unresolved `cj:` comment(s), listed under Unresolved above."
+
+Never leave the reader guessing about whether the file is ready for follow-up work.
+```
+
+Keep answers direct. If a question has a simple answer, one sentence. If it needs nuance, two or three. Do not pad. The user reads the summary, not the intermediate work.
+
+**Long summaries go to a file.** If the summary runs beyond six to eight items, or beyond ~500 words of prose, write it to `/tmp/respond-to-cj-summary-YYYY-MM-DD.org` as an org file and open it in `emacsclient -n <path>`. The user may annotate the file, add new `cj:` comments in-line, or answer open questions in-line, then save. After the user saves, re-read the file for any additional instructions or answers and handle those before ending the run. A long summary dumped straight into chat is hard to comment on and hard to return to; the file form makes it reviewable and iterable.
+
+**Every file path in the summary must be a clickable org-mode link.** Use absolute paths: `[[file:/absolute/path/to/file][display label]]`. Plain paths wrapped in `=verbatim=` markers aren't clickable in emacs. URLs in the summary should also use link form: `[[https://...][label]]`. This applies to every file reference in the summary file — source files read, artifacts created, archive destinations, anywhere the user might want to jump. If there's a path in the summary, it should open when clicked.
+
+### 6. Cleanup pass
+
+Remove every `cj:` comment that was handled in step 4 (instructions done) or step 5 (questions answered). The file is clean after the skill runs. Any comment left unresolved stays in place and is listed under `### Unresolved` in the summary with the reason.
+
+Confirm the cleanup by re-scanning the file after removals. If any `cj:` line survives that shouldn't, remove it and note the correction.
+
+## Principles
+
+- **Accuracy > speed.** Subagent generously. A wrong answer to a `cj:` comment is worse than a slow one.
+- **Don't guess.** If a question needs verification and verification isn't possible, say so. Surface unknowns rather than fabricate.
+- **Preserve the file.** Don't reformat surrounding lines. Don't reorder tasks. Touch only what the comment asks for, plus the comment's own removal.
+- **The user reads the summary, not the process log.** Write summaries that are directly useful.
+- **Public writing gets humanizer + commits.md. Private writing doesn't.** Don't over-process internal notes.
+
+## Anti-patterns
+
+- Handling complex `cj:` items inline in the main thread instead of subagenting.
+- Batching unrelated `cj:` comments into one giant subagent prompt.
+- Removing a `cj:` comment before the user has seen the answer in the summary.
+- Skipping the humanizer + commits.md pass on public-facing writing because it "looked fine already."
+- Guessing on a question instead of spawning a research subagent.
+- Letting a subagent edit the source file directly — review surface loss.
+
+## Example
+
+Input file: `todo.org` containing:
+
+```
+** TODO Draft the D2P2 Pillar 1 writeup
+cj: what do we actually know about Orion's Belt's partner API contract?
+cj: check whether Redwire's AdTech feed is still in scope. Eric mentioned deprecating it in sprint planning.
+```
+
+Skill run:
+
+1. Scans, finds two `cj:` items (one question, one instruction-with-verification).
+2. Marks the parent `** TODO` as `DOING`.
+3. Spawns two subagents in parallel:
+   - **Q subagent:** reads `deepsat/knowledge.org`, greps ticket history for "Orion's Belt API", checks recent meeting transcripts. Reports findings with evidence.
+   - **Instruction-with-verification subagent:** reads sprint-planning transcripts, searches for "Redwire AdTech deprecation", checks Linear for related tickets. Reports whether the feed is still in scope.
+4. Main thread applies the `DOING` status change, writes the summary with both answers, removes both `cj:` comments.
+5. Reports:
+   - Q1: What we know about Orion's Belt's partner API contract → answer + file:line refs + confidence.
+   - Q2/Instruction: Redwire AdTech scope → answer + evidence + recommendation for whether to cite in Pillar 1.
+
+Craig reads the summary, decides what to do with the Redwire finding, and the TODO stays in `DOING` until he advances it.
diff --git a/.claude/commands/respond-to-review.md b/.claude/commands/respond-to-review.md
new file mode 100644
index 0000000..3508fa5
--- /dev/null
+++ b/.claude/commands/respond-to-review.md
@@ -0,0 +1,54 @@
+# /respond-to-review — Evaluate and Implement Code Review Feedback
+
+Evaluate code review feedback technically before implementing. Verify suggestions against the actual codebase — don't implement blindly.
+
+## Usage
+
+```
+/respond-to-review [PR_NUMBER]
+```
+
+If no PR number is given, check the current branch's open PR for pending review comments.
+
+## Instructions
+
+### 1. Gather the Feedback
+
+- Fetch review comments using `gh api repos/{owner}/{repo}/pulls/{number}/comments` (for inline review comments) and `gh api repos/{owner}/{repo}/issues/{number}/comments` (for top-level PR conversation comments)
+- Read each comment in full. Group related comments — reviewers often raise connected issues across multiple comments.
+
+### 2. Evaluate Each Item
+
+For each review comment, before implementing anything:
+
+1. **Restate the suggestion in your own words** — make sure you understand what's being asked. If you can't restate it clearly, ask for clarification before touching code.
+2. **Verify against the codebase** — check whether the reviewer's premise is correct. Reviewers sometimes misread code, miss context, or reference outdated patterns. Read the actual code they're commenting on.
+3. **Check YAGNI** — if the reviewer suggests a "more proper" or "more robust" implementation, grep the codebase for actual usage. If nothing uses the suggested pattern today, flag it: "This would add complexity without current consumers. Should we defer until there's a concrete need?"
+4. **Assess the suggestion** — categorize as:
+   - **Correct and actionable** — implement it
+   - **Correct but out of scope** — acknowledge and create a follow-up issue
+   - **Debatable** — present your reasoning and ask for the reviewer's perspective
+   - **Incorrect** — explain why with evidence (file paths, test results, documentation)
+
+### 3. Respond
+
+- Lead with technical substance, not agreement
+- If you disagree, explain why with code references — not opinion
+- If you're unsure, say so and ask a specific question
+- Never implement a suggestion you don't understand
+
+### 4. Implement
+
+- Address simple fixes first, complex ones after
+- Test each change individually — don't batch unrelated fixes into one commit
+- Run the full test suite after all changes
+- Commit with conventional messages referencing the review: `fix: Address review — [description]`
+
+### 5. Report
+
+- Summarize what was implemented, what was deferred, and what needs discussion
+- Link any follow-up issues created for out-of-scope suggestions
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/review-code.md b/.claude/commands/review-code.md
new file mode 100644
index 0000000..1bb2bde
--- /dev/null
+++ b/.claude/commands/review-code.md
@@ -0,0 +1,353 @@
+---
+name: review-code
+description: Review code changes against engineering standards. Accepts a PR number, a SHA range (BASE..HEAD), the current branch's diff against main, staged changes, or a described scope ("the last 3 commits"). Audits CLAUDE.md adherence (reads root + per-directory CLAUDE.md), intent-vs-delivery (when given a plan/ADR/ticket), security, testing (TDD evidence + three-category coverage), conventions (conventional commits + no AI attribution), root-cause discipline, architecture (layering + scope), API contracts. Produces a structured report — Strengths, per-criterion PASS/WARN/FAIL, per-issue Critical/Important/Minor severity — ending with an explicit verdict (Approve / Request Changes / Needs Discussion) plus 1-2 sentence reasoning. Self-filters low-confidence findings; never flags pre-existing issues, lint/typecheck issues (CI handles those), or changes on unmodified lines. Use before merging a PR, before pushing a branch, or when reviewing a teammate's work. Do NOT use for proposing features (use brainstorm or arch-design), drafting implementation (use start-work or add-tests), standalone security audits (use security-check), or narrow style-only checks (a linter handles those).
+---
+
+# /review-code
+
+Review code changes against engineering standards. Produce a structured report with strengths, per-criterion audit, severity-tagged issues, and an explicit verdict.
+
+## Usage
+
+Point the skill at code being reviewed in any of these ways:
+
+- `/review-code [PR_NUMBER]` — fetch the PR diff via `gh pr view` + `gh pr diff`
+- `/review-code BASE_SHA..HEAD_SHA` — any git range
+- `/review-code` (no args) — the current branch's diff against its merge base with `main`
+- `/review-code --staged` — only staged changes (pre-commit scrutiny)
+- Or describe it: "review my changes in the current branch" / "review the last 3 commits"
+
+Optionally, provide intent context for delivery grading:
+
+- `plan=docs/design/<feature>.md`
+- `adr=docs/adr/<NNNN>-<title>.md`
+- `ticket=LINEAR-<id>` (fetches the ticket for comparison)
+
+When intent context is given, the review grades "does this match what was asked?" on top of code hygiene. When not, that section is marked N/A.
+
+## Execution Model
+
+For substantive reviews on large diffs: **dispatch the perspective passes as parallel sub-agents** via the Agent tool. Each sub-agent starts with a clean context window — the reviewer shouldn't inherit the implementer's mental model. For small single-commit tweaks, run inline.
+
+## Phase 0 — Eligibility Gate
+
+Before reviewing, confirm the review is worth running. Skip with a short note if any apply:
+
+- PR is closed or merged (reviewing merged code is after-the-fact observation, not a gate)
+- PR is a draft not marked ready-for-review
+- Change is an automated dep bump (dependabot, renovate) — trust the bot + CI
+- Change is trivial (whitespace-only, typo-only, revert with obvious justification)
+- A `/review-code` report already exists for this SHA range (no new commits since)
+
+If skipping, report: "Skipped — <reason>" and stop.
+
+## Phase 1 — Gather Context
+
+1. **Resolve the input to a concrete diff:**
+   - PR: `gh pr view <n> --json title,body,baseRefName,headRefName,files` + `gh pr diff <n>`
+   - SHA range: `git diff <base>..<head>` + `git log <base>..<head> --format='%h %s%n%b'`
+   - Current branch: `git merge-base HEAD main` → diff from there
+   - `--staged`: `git diff --cached`
+
+2. **Collect CLAUDE.md files to audit against:**
+   - Root project `CLAUDE.md` if present
+   - Any `CLAUDE.md` in directories whose files the diff modified
+   - Their paths go into the audit; their content guides the CLAUDE.md adherence criterion
+
+3. **If intent context was provided**, fetch and read it. Extract: stated goal, scope, non-goals, acceptance criteria, linked ADRs.
+
+4. **Scope summary** — record for the final report: file count, added/removed lines, commit count, touched modules.
+
+## Phase 2 — Multi-Perspective Pass
+
+Each perspective is a distinct review angle. For substantial changes, dispatch them as parallel sub-agents. For small changes, run sequentially inline.
+
+Follow `subagents.md` for the dispatch contract — each perspective's prompt needs explicit scope, pasted diff context, constraints (don't flag unmodified lines, don't rewrite the PR), and a required output format. Perspectives are read-only and independent, so parallel fan-out is always safe here.
+
+### Perspective A — CLAUDE.md Adherence
+
+Read the CLAUDE.md files collected in Phase 1. For each rule they state, check whether the diff honors it. CLAUDE.md is writing guidance, so not every rule applies at review time — focus on rules that are *assertions about what committed code should look like*.
+
+Example: if CLAUDE.md says "prefer `cast()` over `# type: ignore`," flag any `# type: ignore` in the diff. If it says "always run `make validate-parens` before commit," that's process guidance, not a reviewable code attribute.
+
+### Perspective B — Shallow Bug Scan
+
+Read the file changes in the diff. Ignore context beyond the changes themselves. Focus on **large, obvious bugs**:
+
+- Null / undefined dereferences on values the code can't guarantee
+- Off-by-one errors in boundary conditions
+- Incorrect error handling (swallowing exceptions, returning success on failure paths)
+- Data mutation under concurrent access without coordination
+- Incorrect SQL / query shape (wrong join, missing where clause)
+- Obvious crashes (`raise` without arguments, typed nil, etc.)
+
+Avoid small issues and nitpicks; those are for the Minor severity tier, if worth mentioning at all.
+
+### Perspective C — Git History Context
+
+Run `git blame` on the modified lines. Read recent commits that touched the same files. Check:
+
+- Is this change contradicting a recent deliberate choice? (Someone just fixed X; now this PR re-introduces it.)
+- Is there a pattern of the same bug being fixed repeatedly here?
+- Is the surrounding code style / convention consistent with what's being added?
+
+### Perspective D — Prior PR Comments
+
+`gh pr list` the PRs that previously touched these files. Check their review comments. If prior reviewers flagged a pattern, check whether the current diff repeats it.
+
+### Perspective E — Code Comments In Scope
+
+Read comments in the modified files (both touched and nearby). If the code has guidance ("DO NOT call this before X is initialized"), check the diff complies.
+
+## Phase 3 — Criteria Audit
+
+For each criterion below, report **PASS / WARN / FAIL** with file:line references. WARN and FAIL findings become issues in the Phase 4 summary.
+
+### Intent vs Delivery (when intent context provided)
+
+- Does the implementation match the plan / ADR / ticket?
+- Are acceptance criteria demonstrably met?
+- Scope creep: changes not in the plan?
+- Missing requirements: plan items unimplemented?
+
+Skip if no intent context; note "not evaluated" in the report.
+
+### Security
+
+- No hardcoded secrets, tokens, API keys, or credentials
+- All user input validated at system boundary
+- Parameterized queries only (no SQL string concatenation)
+- No sensitive data logged (PII, tokens, passwords)
+- Dependencies pinned and auditable
+
+### Testing (TDD Evidence)
+
+- Tests exist for all new code
+- Test commits **precede** implementation commits (TDD workflow)
+- Three categories covered: Normal, Boundary, Error per function; thorough on edge cases
+- Tests are independent — no shared mutable state
+- Mocking at external boundaries only (network, file I/O, time) — domain logic tested directly
+- Test naming follows project convention
+- Coverage does not decrease without justification
+- Parameter-heavy functions: author considered pairwise coverage via `/pairwise-tests`? (Not required; worth noting if missing.)
+
+### Conventions
+
+- Type annotations on all functions (including return types)
+- Conventional commit messages (`feat:`, `fix:`, `chore:`, etc.)
+- **No AI attribution anywhere** — code, comments, commits, PR descriptions — see `commits.md`
+- One logical change per commit
+- Docstrings on public functions/classes
+
+### Root Cause & Thoroughness
+
+- Bug fixes address the root cause, not surface symptoms
+- Changes demonstrate understanding of surrounding code
+- Edge cases covered comprehensively, not just the happy path
+
+### Architecture
+
+- Request handlers thin; business logic in services/domain
+- No unnecessary abstractions or over-engineering
+- Changes scoped to what was asked (no drive-by refactoring)
+- Stated architecture respected — if `.architecture/brief.md` exists, check conformance (see `arch-evaluate`)
+
+### API Contracts
+
+- New endpoints have typed contracts or schemas defined
+- No raw dict/object responses bypassing the contract layer
+- Client-side types match server-side output
+- Data flows through the API layer, not direct data access from handlers
+
+## Phase 4 — Filter and Categorize
+
+### Confidence Filter
+
+For each issue surfaced by any perspective or criterion, self-scrutinize: **am I really sure this is a real issue, or could I be wrong?** Rate your confidence honestly:
+
+- **High** — verified by reading the code; matches a pattern that causes bugs in practice; or a direct CLAUDE.md violation you can cite
+- **Medium** — likely real but contingent on context you didn't fully verify
+- **Low** — looks off but you can't confirm; might be a false positive
+
+**Drop Low-confidence issues before the final report.** Medium and High issues appear; Medium ones noted as such where uncertainty is relevant.
+
+### False-Positive Filter
+
+Do **not** flag any of these as issues:
+
+- **Pre-existing issues** on unmodified lines — note separately as "for follow-up," don't block this PR
+- **Lint / typecheck / test failures** — CI handles those; don't run builds yourself
+- **Nitpicks a senior engineer wouldn't call out** — unless CLAUDE.md explicitly requires them
+- **Style issues** — formatters and linters handle these
+- **Issues explicitly silenced in code** (e.g., `# type: ignore[...]` with a reason, lint ignore comments) unless the silencing is unjustified
+- **Intentional changes** in functionality clearly related to the PR's stated goal
+- **Changes in unmodified lines** (real issues in files the PR touches but on lines it doesn't change)
+- **Framework behavior being tested** — see `testing.md` anti-patterns
+
+### Severity Categorization
+
+Remaining issues get tagged:
+
+- **Critical** — merge-blockers: security holes, broken functionality, data loss risk, missing acceptance criteria, AI attribution in committed content
+- **Important** — should fix: architecture problems, test gaps, error handling holes, pattern violations
+- **Minor** — nice to have: missing docstrings, docstring drift, small optimizations
+
+## Phase 5 — Output
+
+```markdown
+# Code Review — <PR title / branch name / SHA range>
+
+**Scope:** <N files, +X / -Y lines, Z commits>
+**Input:** <pr#N / base..head / current branch>
+**Intent context:** <plan/ADR/ticket link, or "none provided">
+**CLAUDE.md files audited:** <list of paths>
+
+## Strengths
+
+Three minimum. Specific, with file:line.
+
+- <Specific positive>
+- <Specific positive>
+- <Specific positive>
+
+## Per-Criterion Audit
+
+| Criterion | Status | Notes |
+|---|---|---|
+| CLAUDE.md Adherence | PASS / WARN / FAIL | ... |
+| Intent vs Delivery | PASS / WARN / FAIL / N/A | ... |
+| Security | PASS / WARN / FAIL | ... |
+| Testing (TDD Evidence) | PASS / WARN / FAIL | ... |
+| Conventions | PASS / WARN / FAIL | ... |
+| Root Cause & Thoroughness | PASS / WARN / FAIL | ... |
+| Architecture | PASS / WARN / FAIL | ... |
+| API Contracts | PASS / WARN / FAIL | ... |
+
+## Issues
+
+### Critical (must fix before merge)
+
+1. **<Title>**
+   - File: `<path>:<line>`
+   - Problem: <what's wrong>
+   - Why it matters: <impact>
+   - Fix: <concrete suggestion>
+
+### Important (should fix)
+
+1. **<Title>** — `<file>:<line>` — <one-sentence description>
+
+### Minor (nice to have)
+
+1. **<Title>** — `<file>:<line>`
+
+## Recommendations
+
+Meta-level suggestions: process changes, follow-up tickets, architectural drift observations.
+
+## Verdict
+
+**<Approve / Request Changes / Needs Discussion>**
+
+**Reasoning:** <1-2 sentences. Grounded in the audit.>
+```
+
+## Critical Rules
+
+**DO:**
+- Categorize issues by actual severity — not everything is Critical
+- Cite specifics — `file:line`, not vague prose
+- Explain why each issue matters, not just what it is
+- Acknowledge strengths — mandatory; three minimum
+- Give a clear verdict with reasoning
+- Trust CI for lint, typecheck, test runs; don't re-run them
+- Self-filter low-confidence findings before reporting
+
+**DON'T:**
+- Say "looks good" without citing what was checked
+- Mark style nitpicks as Critical
+- Give feedback on code you didn't actually read
+- Flag pre-existing issues as PR-blockers
+- Use emojis or marketing-adjacent language
+- Skip the verdict or hedge it
+- Add any AI attribution to the output
+
+## Example Output
+
+```markdown
+# Code Review — feat: add inventory export CSV (#447)
+
+**Scope:** 8 files, +312 / -24 lines, 5 commits
+**Input:** PR #447
+**Intent context:** docs/design/inventory-export.md
+**CLAUDE.md files audited:** CLAUDE.md, api/CLAUDE.md
+
+## Strengths
+
+- Characterization tests added before refactor (`tests/test_inventory_export.py:12-89`) — TDD evidence clear across commit order
+- Export batches via generators rather than loading full inventory (`inventory/export.py:42-78`) — handles the 50k-row case without OOM
+- API schema versioned from day one (`api/schemas/export.py:5`) — cleaner than most first-endpoints
+
+## Per-Criterion Audit
+
+| Criterion | Status | Notes |
+|---|---|---|
+| CLAUDE.md Adherence | PASS | No `# type: ignore` without justification; conventional commits clean |
+| Intent vs Delivery | PASS | Matches plan §3.2; acceptance criteria met |
+| Security | WARN | User-provided filename not sanitized (see Important #1) |
+| Testing (TDD Evidence) | PASS | 5 test commits precede 3 implementation commits |
+| Conventions | PASS | All conventional, no AI attribution detected |
+| Root Cause & Thoroughness | PASS | Empty inventory, Unicode, large batches all covered |
+| Architecture | PASS | Handler thin; logic in inventory service |
+| API Contracts | PASS | Schema defined; TS types match in client/ |
+
+## Issues
+
+### Critical
+
+None.
+
+### Important
+
+1. **Filename injection via user input**
+   - File: `api/views/inventory.py:42`
+   - Problem: User-provided `filename` passed directly to `Content-Disposition` header
+   - Why it matters: CRLF injection → header smuggling; also sets up a path-traversal risk if ever used for disk writes
+   - Fix: `secure_filename(user_filename)` (werkzeug) or regex-strip to `[A-Za-z0-9._-]+`
+
+### Minor
+
+1. **Missing type on private helper** — `inventory/export.py:22` — `_chunked()` return type unspecified
+2. **Docstring drift** — `inventory/service.py:104` — docstring describes pre-batch behavior
+
+## Recommendations
+
+- Consider `/pairwise-tests` on the `export_options` function (4 parameters × 2-3 values each); currently 3 hand-written tests cover a fraction of the combinatorial space
+
+## Verdict
+
+**Request Changes**
+
+**Reasoning:** Core implementation is strong — TDD evidence, thin handler, proper schema, good edge coverage. The filename-injection issue is a must-fix before merge; once resolved this is a clean approve.
+```
+
+## Anti-Patterns
+
+- **All-caps everything.** If everything is Critical, nothing is. Be honest about severity.
+- **No Strengths section.** Reviews that only list problems are inaccurate — you missed what's right. Minimum three.
+- **Verdict without reasoning.** "Request Changes" alone is unhelpful. One or two sentences on why.
+- **Criteria skipped silently.** If you didn't check API contracts, say "N/A — no API changes in this diff," not silence.
+- **Reviewing code you didn't read.** Don't feedback on files you skimmed.
+- **Running the build yourself.** CI does that. Don't re-verify what CI is for.
+- **Hedging ("might be an issue").** If you can't commit to it, it's Low-confidence — drop it.
+
+## Hand-Off
+
+- **Critical** → must be addressed before merge; author fixes, re-review via `/review-code` on the updated SHA
+- **Important** → fix, or deliberately defer with an ADR (run `/arch-decide`)
+- **Minor** → follow-up issues or a cleanup PR
+- **Intent-vs-Delivery gaps** → either file tickets for the missing pieces or update the plan to reflect reality
+
+## Content scope
+
+Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/.claude/commands/security-check.md b/.claude/commands/security-check.md
new file mode 100644
index 0000000..ca431e0
--- /dev/null
+++ b/.claude/commands/security-check.md
@@ -0,0 +1,48 @@
+# /security-check — Audit Changes for Security Issues
+
+Scan staged or recent changes for secrets, OWASP vulnerabilities, and dependency risks.
+
+## Usage
+
+```
+/security-check [FILE_OR_DIRECTORY]
+```
+
+If no argument is given, audit all staged changes (`git diff --cached`). If there are no staged changes, audit the diff from the last commit.
+
+## Instructions
+
+1. **Gather the changes** to audit:
+   - Staged changes: `git diff --cached`
+   - Or last commit: `git diff HEAD~1`
+   - Or specific path if provided
+
+2. **Check for hardcoded secrets** — scan for patterns:
+   - AWS access keys (`AKIA...`)
+   - Generic secret patterns (`sk-`, `sk_live_`, `sk_test_`)
+   - Password assignments (`password=`, `passwd=`, `secret=`)
+   - Private keys (`-----BEGIN.*PRIVATE KEY-----`)
+   - `.env` file contents committed by mistake
+   - API tokens, JWTs, or bearer tokens in source code
+
+3. **OWASP Top 10 review**:
+   - SQL injection: string concatenation in queries
+   - XSS: unsanitized user input rendered in HTML/JSX
+   - Broken authentication: missing permission checks on endpoints
+   - Insecure deserialization: unsafe deserialization of untrusted data (e.g., eval, exec)
+   - Security misconfiguration: debug mode enabled in production settings
+   - Sensitive data exposure: PII or tokens in log statements
+
+4. **Dependency audit**:
+   - Run `pip-audit` if Python files changed
+   - Run `npm audit` if JavaScript/TypeScript files changed
+   - Flag any new dependencies added without version pinning
+
+5. **Report findings** in a table:
+
+   | Severity | File:Line | Finding | Recommendation |
+   |----------|-----------|---------|----------------|
+
+   Severity levels: CRITICAL, HIGH, MEDIUM, LOW, INFO
+
+6. If no issues found, report "No security issues detected" with a summary of what was checked.
diff --git a/.claude/commands/start-work.md b/.claude/commands/start-work.md
new file mode 100644
index 0000000..311476d
--- /dev/null
+++ b/.claude/commands/start-work.md
@@ -0,0 +1,269 @@
+---
+name: start-work
+description: Pick up a task (Linear ticket, GitHub issue, todo.org task, or a described scope) and take it through Claim, Justify, Approach, Implement, Verify, and Hand-off. Three user-approval gates separate the phases. The Justify gate covers benefits, costs, engineer/user impact, urgency, effort, alternatives, and a ticket-quality check. The Approach gate covers root cause, risk, refactor prerequisites, test strategy (unit, integration, e2e, pairwise, characterization), migration and backwards-compat, feature-flag question, commit decomposition, and branch name. Implementation uses TDD (red, green, edge cases, refactor audit of every touched file). The audit walks the whole of each touched file against a language-agnostic checklist; every finding is either fixed on this branch or filed as a ticket — nothing is silently dropped. A verify phase exercises the feature end-to-end in the local environment (Playwright against localhost for web projects, scripted manual test otherwise) before the final gate confirms readiness and hands off to the Review-and-Publish flow in commits.md. Use when starting work on a specific task where both "should we" and "how exactly" are worth deliberating. Do NOT use for open-ended bug investigation without a clear target (use debug first), for architectural paradigm exploration (use arch-design), for architectural decision recording (use arch-decide), when the task is trivial and obvious (just do it), or when requirements are still being shaped (use brainstorm).
+---
+
+# /start-work: pick up a task, justify it, plan it, build it
+
+Three review gates separate the phases. The user can redirect or kill the work at each one.
+
+1. **Claim.** Mark in-progress, assign, label, verify project.
+2. **Justify (gate 1).** Benefits, costs, impact, urgency, effort, alternatives, ticket quality. Stop for approval.
+3. **Approach (gate 2).** Root cause, risk, tests, migration, flag, commit decomposition. Stop for approval.
+4. **Implement.** TDD red, green, edge cases, refactor audit of every touched file.
+5. **Verify.** End-to-end or scripted manual test in the local environment.
+6. **Ready to commit (gate 3).** Report, stop for approval.
+7. **Hand off** to the Review-and-Publish flow in `commits.md`.
+
+## Usage
+
+```
+/start-work <task-ref>
+```
+
+`<task-ref>` can be:
+
+- A Linear ticket ID or URL: `SE-170`, `https://linear.app/deepsat/issue/SE-170`
+- A GitHub issue URL or number
+- A todo.org heading reference or description: `todo.org:mission-sync refactor`
+- A free-form scope description: "update the mission-card fallback"
+
+If the reference is ambiguous, ask the user to clarify before proceeding.
+
+## Phase 0: eligibility
+
+Skip with a short note and stop if any apply:
+
+- Task is already Done, closed, or merged.
+- Task is assigned to someone else and the user has not asked to take it over.
+- Task is an obvious duplicate of something in-progress.
+- Task description is so vague that even the Justify gate cannot engage. Route to `/brainstorm`.
+
+## Phase 1: claim
+
+Make ownership explicit before any other work starts. The exact steps depend on where the task lives.
+
+### Linear ticket
+
+1. Fetch the ticket with the Linear MCP tools.
+2. Move status to **In Progress**.
+3. Assign the user. If another assignee is already present, add the user as a second assignee. If the Linear API does not accept multiple assignees, post a comment ("Picking this up alongside <existing assignee>") and proceed.
+4. Verify the ticket has exactly one of these labels: **Bug**, **Test**, **Chore**, or **Feature**. If missing or wrong, ask the user which applies and set it.
+5. Verify the ticket's project. If unset or wrong, ask the user which project it belongs to.
+
+### GitHub issue
+
+1. Fetch with `gh issue view <n> --json title,body,state,assignees,labels`.
+2. Assign to the user: `gh issue edit <n> --add-assignee @me`.
+3. Verify the Bug / Test / Chore / Feature label. Add if missing.
+4. Post a comment noting you are starting work.
+
+### Todo.org task
+
+1. Locate the heading the user referenced.
+2. Change the TODO keyword to `DOING`.
+3. Add exactly one tag: `:bug:`, `:feature:`, `:test:`, or `:chore:`. Ask the user which applies if none is obvious. Todo.org is personal, so there is no assignee step.
+
+### Unticketed
+
+1. Note in the session that the work is unticketed.
+2. Ask the user whether to create a ticket or issue retroactively before continuing. If no, proceed but flag in the final commit message that there is no linked ticket.
+
+## Phase 2: justify (gate 1)
+
+Read the task description end to end. Skim the code it references.
+
+Then produce a justification that covers all of these, concisely:
+
+1. **Benefits.** What is better after this lands? Concrete, not abstract.
+2. **Costs.** Time, risk, reviewer bandwidth, ceremony overhead.
+3. **Engineer impact.** Does it make someone's life easier? Catch a class of bug? Remove friction?
+4. **End-user impact.** Behavioral change? Visible? Invisible-but-protective?
+5. **Downsides.** What do we lose? Where would we regret doing this?
+6. **Urgency and priority fit.** Does this align with current goals or an upcoming deadline? If the project has committed deadlines, explicitly check this against them. Anything not obviously on the critical path should be called out as "deferrable."
+7. **Effort estimate.** S (under 1 hour), M (1 hour to 1 day), L (over 1 day). Rough is fine.
+8. **Alternatives considered.** Is there a cheaper way? Can we defer? Can we address the root cause via a different path?
+9. **Ticket quality check.** Is scope clear, are acceptance criteria concrete, are reproduction steps present for bugs? If **not clear**, stop and ask the user to choose one of:
+   - (a) Bounce to `/brainstorm` to refine the ticket first.
+   - (b) Ping the ticket author for clarification.
+   - (c) Supply the missing info themselves right now, if it is easy for them to do so.
+
+### Gate
+
+Present the justification to the user. Stop. Wait for questions and explicit approval ("approved", "proceed", or equivalent) before starting Phase 3.
+
+Do not generate the approach while waiting. The user may kill the task at this gate, and any pre-generated approach would be wasted work.
+
+If the user kills the task, roll back the Phase 1 claim: move the ticket back to its prior status, remove the assignment you added, and remove the label you added (if any).
+
+## Phase 3: approach (gate 2)
+
+Read the referenced code end to end. Understand the surrounding context: callers, callees, existing tests, adjacent modules.
+
+Then produce an approach that covers:
+
+1. **Root cause.** For bugs, where the bug originates, not just where it surfaces. For features, which layer owns the new behavior.
+2. **Code that changes.** Files and functions, with a rough line-count estimate.
+3. **Risk.** Who and what does this affect? Local (one file) or does it ripple? Flag anything that touches shared state, public APIs, or core data flow.
+4. **Refactor prerequisites.** Does the codebase need restructuring before this fix is easy? If yes, that is a separate ticket and should be done first.
+5. **Characterization tests.** If modifying existing untested code, write characterization tests first to lock behavior before changing it (see `testing.md`).
+6. **Test strategy decomposition.** Which of these are needed, and roughly how many of each:
+   - Unit tests.
+   - Integration tests.
+   - E2E tests.
+   - Pairwise or combinatorial tests, if parameter-heavy (see `/pairwise-tests`).
+7. **Migration and backwards-compat surface.** DB migration? API contract change? Frontend consumer impact? Config shape change? Flag if yes and describe the scope.
+8. **Feature flag.** Does this ship behind a flag or direct? Always worth asking once.
+9. **Commit decomposition.** One commit, or N commits? Each commit should be one logical change per `commits.md`. Size the Review-and-Publish ceremony ahead of time.
+10. **Branch name.** Following the project convention: `fix/<ID>-slug`, `feature/<ID>-slug`, `chore/<ID>-slug`, or `test/<ID>-slug`. Unticketed work uses a short descriptive slug.
+
+### Gate
+
+Present the approach to the user. Stop. Wait for questions and explicit approval before starting Phase 4.
+
+If the user redirects the approach, update the plan and re-present rather than silently adjusting during implementation.
+
+## Phase 4: implement (TDD)
+
+Follow the red-green-refactor cycle from `testing.md`.
+
+1. **Create the branch** using the name decided in Phase 3.
+2. **Red.** Write a failing test that demonstrates the bug or captures the new desired behavior. Run it. Confirm it fails for the right reason, not because the test itself is broken. Commit as `test: <desc>`.
+3. **Green.** Write the minimal code to make the test pass. Do not generalize yet. Do not add features the test does not require. Commit as `fix:` or `feat:`.
+4. **Edge cases.** Add tests in all three categories per `testing.md`:
+   - Normal: happy path, typical input.
+   - Boundary: empty inputs, nulls, minimum and maximum values, single-element collections, Unicode, long strings, time and timezone boundaries, concurrent access.
+   - Error: invalid inputs, missing required parameters, permission denied, resource exhaustion, malformed data, network failures.
+   Commit as `test: add edge cases for <desc>`.
+5. **Refactor audit.** After tests are green, audit every file you touched in this task — not just the code you wrote, but the whole file, top to bottom. The question is no longer "is my new code clean?" but "what refactoring opportunities exist in this file, and which belong on this branch versus a follow-up ticket?"
+
+   Work the touched-file list explicitly. For each file, walk the checklist below (a through h), note every candidate, and decide its disposition. "Touching" a file means any modification, however small — a one-line edit still qualifies the whole file for audit. Keep tests green throughout. If they go red during a change, you have altered behavior, not just form — stop and decide whether the change is intentional before proceeding.
+
+   The checklist is language-agnostic. The same smells appear in Python, TypeScript, Go, Elisp, Rust, shell, SQL, and anything else.
+
+   a. **Stale documentation.** Comments, docstrings, file headers, module-level summaries, READMEs, ADRs, architectural diagrams, or any prose that now contradicts the code. Update or delete. Prefer deletion when the documentation duplicates what the code, the tooling, or the runtime config already communicates — duplicated information is rotted documentation waiting to happen. The test: if a future reader would learn nothing from the doc that the code does not already say, drop it.
+
+   b. **Duplication.** Three distinct kinds:
+      - *Logic duplication*: the same computation or control flow appearing in multiple places. Extract when it appears three or more times, or when the duplication crosses an abstraction boundary, or when a future divergence between the copies would be a real bug. Two occurrences of a simple expression usually does not justify extraction. Three similar lines beats a premature abstraction.
+      - *Literal duplication*: repeated strings, regexes, magic numbers, paths, URLs, error codes, keywords — any value that would need to change together. A shared constant is cheap insurance and makes the intent explicit.
+      - *Intra-function expression duplication*: the same non-trivial expression evaluated twice inside one scope. Bind it to a local name once. Shorter function and no risk of the two expressions drifting apart when someone edits one.
+
+   c. **Naming drift.** Names that describe what the identifier used to do, not what it does now. Names that mix abstraction levels (a high-level operation named after its implementation detail). Inconsistency across the module: `get_foo` next to `fetch_bar` next to `load_baz` for operations that are semantically the same. Pick one verb per concept and rename. Renaming is cheap in any language with a competent tool, and clarity compounds.
+
+   d. **Scope and cohesion.** Functions doing two things — a name with "and" or two clauses joined by commas is the tell. Split. Related functions scattered across the file. Cluster them with a comment header or section break. Unrelated functions grouped only by a superficial property (all private, all on the same keybinding, all using the same framework feature). Group by purpose, not accident. Code reads like a book. Related concepts should be neighbors.
+
+   e. **Premature abstraction.** Helpers with one caller that do not document intent better than the inline version — inline them. Parameters always passed the same value by every caller — drop them. Configuration knobs that no caller varies — delete them. Interfaces with a single implementation and no realistic second — collapse them. Abstractions built "for future flexibility" that have not been exercised are carrying cost with no benefit. Speculative generality is a tax you pay on every read.
+
+   f. **Dead code.** Unused imports, uncalled functions, variables never referenced, parameters never consumed inside the body, types no one uses. Commented-out blocks kept "in case we need it later." You will not need them, and if you do, the version control history has them. Delete.
+
+   g. **Error handling parity.** Similar operations emitting different error shapes (exceptions vs. return values vs. log-and-continue vs. silent swallow). Error messages that expose internal state unhelpfully, or that strip the context a caller needs to act. Guards present in some parallel paths but missing in others. Parity beats novelty — if three siblings behave the same way, the fourth should too, or have a documented reason not to.
+
+   h. **Test smells.** Tests are code and rot the same way. Copy-pasted fixtures that should parametrize. Assertions that lock to implementation (exact strings, internal structure, field order) rather than behavior. Dead mocks that stub something the test no longer exercises. Mocks of internal helpers rather than external boundaries. See `testing.md` "Signs of overmocking."
+
+   i. **Out-of-file scope.** The audit stops at the touched-file boundary. If you happen to notice a smell in a file you did not touch, do not expand the audit into it — file a ticket so the finding is not lost. Drive-by audits across the codebase balloon review time and break the working set. Exception: a rename or structural change that would leave the codebase inconsistent if shipped half-done is in scope and required.
+
+   **Disposition for each candidate.** Every candidate must land in one of three buckets. There is no silent drop.
+
+   - *Fix now, fold into the related feature or fix commit*: small, directly related to the task's work on this file, obviously clearer, no new risk surface.
+   - *Fix now, separate `refactor:` commit on this branch*: related to the surface you touched but larger in scope, or reshaping something non-trivial. Separating it keeps the feature commit focused for review.
+   - *File a ticket or todo.org entry*: the smell is real but unrelated to this task, lives in a touched file outside the task's working set, or was noticed in an untouched file. Filing — not skipping — is the default for anything that does not fit the two "fix now" cases. Capture enough detail that a future session can act on it: file path, line or function, smell category (one of a through h), and a one-line description.
+
+   If a candidate feels too small to fix and too small to file, it was either not a real smell, or you are talking yourself out of a two-line todo entry. Write the entry.
+
+   **Stop conditions.** The *audit* is complete when every touched file has been walked and every candidate has a disposition. The *fixing* stops earlier: ask "would a reasonable reviewer flag this?" of the remaining in-scope candidates. If the answer is no, stop fixing and file the rest. Shipping beats polishing, but filing beats forgetting.
+
+   Commit: group meaningful refactors into a `refactor: <desc>` commit when they stand on their own. Fold small tweaks into the associated feature or fix commit when they are tied to the same scope. The commit history should let a future reader see intent per commit, not a mixture of "did the thing" and "also cleaned up five unrelated corners."
+
+### Constraints
+
+- **Root cause, not symptom.** If the task is a bug, fix where the bug originates, not where it surfaces.
+- **No drive-by refactoring.** Only change code the task requires. Unrelated cleanups go in a separate ticket.
+- **No hypothetical-future code.** Solve the current problem. Do not design for requirements that have not been asked.
+- **Framework and library code is trusted.** Mock at boundaries (network, time, file I/O), not at internal helpers (see `testing.md` "Signs of overmocking").
+
+## Phase 5: verify end-to-end
+
+Unit tests prove the internals are green. They cannot prove the feature works for the user. Before the ready-to-commit gate, exercise the feature end-to-end on your local machine — a running dev server on localhost for web work, the actual editor or CLI for everything else. Never production. Production verification is a separate concern that belongs to release procedures, not to a pre-merge workflow. Skipping this phase is how "all tests green" becomes "shipped broken" — it caught a one-second browser-open timeout in local testing that no unit test had any way to see.
+
+Pick the verification mode that matches the project's stack.
+
+### If the project has browser-automatable UI
+
+Web apps, dashboards, SPAs, admin tools, any feature reachable through a browser. Write a Playwright end-to-end test that exercises:
+
+- The happy path the feature was built for, clicking through as a user would.
+- Any boundary or error cases that unit tests could not reach: authentication, cross-page navigation, state across reloads, deep-link URLs, permission-denied flows.
+- The user-observable failure mode of any known upstream dependency, mocked or stubbed where needed.
+
+The E2E test lives in the repo alongside the feature and runs in CI like any other test. Delegate the test authoring to `/playwright-js` for JavaScript or TypeScript stacks, `/playwright-py` for Python stacks. Do not write Playwright code from scratch when those skills are available.
+
+### If the project has no browser UI
+
+CLI tools, libraries, Emacs or editor configuration, shell scripts, daemons, anything where there is no DOM to automate. Lead the user through a scripted manual test. Provide:
+
+1. **An explicit sequence of steps.** Specific commands to run, specific keys to press, specific files to open. Not "try the feature" but "open file X, press C-; h d, pick draft Y."
+2. **The expected observable outcome at each step.** What message should appear in the echo area, what buffer should show, what file should change on disk, what exit code the process should return, what the browser should display. One expected outcome per step so failures pinpoint where.
+3. **Failure signals.** What broken looks like. "If you see nothing in the echo area, the binding did not fire. If you see `No #+hugo_draft keyword`, the buffer has no Hugo front matter." Pattern-matching against known failure modes shortens diagnosis.
+
+Wait for the user to walk through the steps and report back. Do not skip ahead. Do not assume success without the user's confirmation. If the user reports a failure, route the failure back through Phase 3 (if the approach was wrong) or Phase 4 (if the implementation was wrong), then re-verify.
+
+### In both modes
+
+- **Run against a clean environment.** Restart the process, clear the cache, open a fresh browser session, re-evaluate the loaded module. Stale state masks real bugs — today's "toggling the draft doesn't work" turned out to be stale code in a running Emacs.
+- **Verify failure paths, not just the happy path.** A feature that works when nothing goes wrong is half-tested. Force an error path if the feature has one.
+- **If verification reveals a unit-test gap, add the missing unit test before gate 3.** A bug you hit manually is a bug worth locking in with a test so it cannot regress.
+- **Keep the verification artifact.** For browser work, the Playwright test stays in the repo. For manual scripts, paste the steps into the Phase 6 handoff report so a reviewer can re-verify on request.
+
+### Stop condition
+
+Every verified scenario produces its expected observable outcome. Any failure is routed back to Phase 3 or Phase 4 — not papered over, not marked as "known issue" without filing a follow-up ticket.
+
+## Phase 6: ready to commit (gate 3)
+
+Before handing off to the Review-and-Publish flow, stop and report:
+
+- What was done. Files changed, tests added, test-suite result.
+- What was verified in Phase 5, and how. For manual scripts, paste the step list so a reviewer can re-run the verification. For Playwright tests, name the test file.
+- Any deviations from the Phase 3 approach that happened during implementation, and why.
+- Follow-up tickets filed during the refactor audit, listed by ID so the reviewer can see what was deferred and why. "Surfaced" is not enough — these are actually filed before gate 3 clears, not left as a mental note.
+
+Wait for explicit approval before starting the commit and PR ceremony.
+
+If deviations are significant, the user may want to loop back and revise the approach before publishing.
+
+## Phase 7: hand off to Review-and-Publish
+
+Follow `commits.md` exactly. Summary of the flow:
+
+1. Run `/review-code --staged` before each commit, or `/review-code` on the whole branch before the PR. Block on Critical or Important findings.
+2. Draft the commit message to `/tmp/commit-<slug>.md`. Run `humanizer`. Apply the plain-English pass. Replace semicolons. Stop for approval.
+3. After approval, commit.
+4. Draft the PR body to `/tmp/pr-<ticket-or-slug>.md`. Body must include a `Linear:` or equivalent cross-link line. Run `humanizer`. Apply the plain-English pass. Replace semicolons. Stop for approval.
+5. After approval, push and run `gh pr create`.
+6. Post the PR URL back to the Linear ticket, GitHub issue, or todo.org entry.
+7. Move the Linear or GitHub status to **Dev Review**. Todo.org has no equivalent. Leave the todo.org entry as `DOING` until the PR merges.
+
+## Anti-patterns
+
+- **Skipping the Justify gate.** "This is obviously worth doing" is exactly what the gate exists to verify. If the answer really is obvious, the gate takes thirty seconds.
+- **Skipping the Approach gate.** Implementation without a plan is how scope creep happens. It is also how the user loses the chance to redirect.
+- **Marking a task In Progress before Phase 2 approval.** If the Justify gate kills the task, the Claim should roll back cleanly.
+- **Blurring the gates.** Write the justification, stop, wait. Do not pre-generate the approach while waiting. The user may kill the task and the pre-work gets wasted.
+- **Treating Feature tasks as skippable on the Approach gate.** Features especially need the migration, backwards-compat, and feature-flag questions answered up front.
+- **Letting the TDD cycle drift.** If the test passes before the implementation is written, the test is wrong. Confirm the red before moving to green.
+- **Skipping the refactor audit.** A green test suite is necessary, not sufficient. Walking the touched-file list against the refactor checklist catches the stale comment, the naming drift, and the duplicated expression that a reviewer will otherwise flag. Leave the code better than you found it, within scope — and file what you cannot fix on this branch.
+- **Auditing only the code you wrote.** "I only changed one line, the rest of the file isn't my problem" — it is, to the extent that you file what you see. The audit is per touched file, not per diff hunk. Anything noticed in a touched file lands somewhere: this branch or a ticket.
+- **Skipping the verify phase.** Green unit tests do not mean the feature works for the user. A one-second delay that looks fine on a mocked process is a broken experience on a real Hugo build. Five minutes of scripted manual testing or a Playwright run catches the gap before a reviewer does.
+
+## Cross-references
+
+- `commits.md`: the Review-and-Publish flow used in Phase 7.
+- `testing.md`: TDD discipline, edge case categories, characterization tests, overmocking signals.
+- `subagents.md`: dispatch contract for parallel code research during Phase 3 if the code surface is large.
+- `/review-code`: runs inside Phase 7.
+- `/brainstorm`: route here from the Phase 2 ticket-quality branch.
+- `/arch-design`: route here if Phase 3 reveals an architectural question the task cannot answer on its own.
+- `/arch-decide`: route here if Phase 3 surfaces a decision worth recording as an ADR.
+- `/debug`: route here if Phase 2 reveals the task needs investigation before it can be justified.
+- `/pairwise-tests`: route here from Phase 3 if the test matrix warrants combinatorial coverage.
+- `/playwright-js`, `/playwright-py`: route here from Phase 5 to author E2E tests for web projects.
diff --git a/arch-decide/LICENSE b/arch-decide/LICENSE
deleted file mode 100644
index 8a8b38e..0000000
--- a/arch-decide/LICENSE
+++ /dev/null
@@ -1,25 +0,0 @@
-MIT License
-
-Copyright (c) 2024 Seth Hobson
-
-Permission is hereby granted, free of charge, to any person obtaining a copy
-of this software and associated documentation files (the "Software"), to deal
-in the Software without restriction, including without limitation the rights
-to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
-copies of the Software, and to permit persons to whom the Software is
-furnished to do so, subject to the following conditions:
-
-The above copyright notice and this permission notice shall be included in all
-copies or substantial portions of the Software.
-
-THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
-IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
-FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
-AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
-LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
-OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
-SOFTWARE.
-
----
-
-Origin: https://github.com/wshobson/agents/blob/main/plugins/documentation-generation/skills/architecture-decision-records/SKILL.md
diff --git a/arch-decide/SKILL.md b/arch-decide/SKILL.md
deleted file mode 100644
index b0078fb..0000000
--- a/arch-decide/SKILL.md
+++ /dev/null
@@ -1,451 +0,0 @@
----
-name: arch-decide
-description: Create, update, and manage Architecture Decision Records (ADRs) that capture significant technical decisions — context, options, chosen approach, consequences. Five template variants (MADR, Nygard, Y-statement, lightweight, RFC). Covers ADR lifecycle (proposed → accepted → deprecated / superseded), review process, and adr-tools automation. Use when documenting an architectural choice, reviewing past decisions, or establishing a decision process. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
----
-
-# Architecture Decision Records
-
-Comprehensive patterns for creating, maintaining, and managing Architecture Decision Records (ADRs) that capture the context and rationale behind significant technical decisions.
-
-## When to Use This Skill
-
-- Making significant architectural decisions
-- Documenting technology choices
-- Recording design trade-offs
-- Onboarding new team members
-- Reviewing historical decisions
-- Establishing decision-making processes
-
-## Core Concepts
-
-### 1. What is an ADR?
-
-An Architecture Decision Record captures:
-
-- **Context**: Why we needed to make a decision
-- **Decision**: What we decided
-- **Consequences**: What happens as a result
-
-### 2. When to Write an ADR
-
-| Write ADR                  | Skip ADR               |
-| -------------------------- | ---------------------- |
-| New framework adoption     | Minor version upgrades |
-| Database technology choice | Bug fixes              |
-| API design patterns        | Implementation details |
-| Security architecture      | Routine maintenance    |
-| Integration patterns       | Configuration changes  |
-
-### 3. ADR Lifecycle
-
-```
-Proposed → Accepted → Deprecated → Superseded
-              ↓
-           Rejected
-```
-
-## Templates
-
-### Template 1: Standard ADR (MADR Format)
-
-```markdown
-# ADR-0001: Use PostgreSQL as Primary Database
-
-## Status
-
-Accepted
-
-## Context
-
-We need to select a primary database for our new e-commerce platform. The system
-will handle:
-
-- ~10,000 concurrent users
-- Complex product catalog with hierarchical categories
-- Transaction processing for orders and payments
-- Full-text search for products
-- Geospatial queries for store locator
-
-The team has experience with MySQL, PostgreSQL, and MongoDB. We need ACID
-compliance for financial transactions.
-
-## Decision Drivers
-
-- **Must have ACID compliance** for payment processing
-- **Must support complex queries** for reporting
-- **Should support full-text search** to reduce infrastructure complexity
-- **Should have good JSON support** for flexible product attributes
-- **Team familiarity** reduces onboarding time
-
-## Considered Options
-
-### Option 1: PostgreSQL
-
-- **Pros**: ACID compliant, excellent JSON support (JSONB), built-in full-text
-  search, PostGIS for geospatial, team has experience
-- **Cons**: Slightly more complex replication setup than MySQL
-
-### Option 2: MySQL
-
-- **Pros**: Very familiar to team, simple replication, large community
-- **Cons**: Weaker JSON support, no built-in full-text search (need
-  Elasticsearch), no geospatial without extensions
-
-### Option 3: MongoDB
-
-- **Pros**: Flexible schema, native JSON, horizontal scaling
-- **Cons**: No ACID for multi-document transactions (at decision time),
-  team has limited experience, requires schema design discipline
-
-## Decision
-
-We will use **PostgreSQL 15** as our primary database.
-
-## Rationale
-
-PostgreSQL provides the best balance of:
-
-1. **ACID compliance** essential for e-commerce transactions
-2. **Built-in capabilities** (full-text search, JSONB, PostGIS) reduce
-   infrastructure complexity
-3. **Team familiarity** with SQL databases reduces learning curve
-4. **Mature ecosystem** with excellent tooling and community support
-
-The slight complexity in replication is outweighed by the reduction in
-additional services (no separate Elasticsearch needed).
-
-## Consequences
-
-### Positive
-
-- Single database handles transactions, search, and geospatial queries
-- Reduced operational complexity (fewer services to manage)
-- Strong consistency guarantees for financial data
-- Team can leverage existing SQL expertise
-
-### Negative
-
-- Need to learn PostgreSQL-specific features (JSONB, full-text search syntax)
-- Vertical scaling limits may require read replicas sooner
-- Some team members need PostgreSQL-specific training
-
-### Risks
-
-- Full-text search may not scale as well as dedicated search engines
-- Mitigation: Design for potential Elasticsearch addition if needed
-
-## Implementation Notes
-
-- Use JSONB for flexible product attributes
-- Implement connection pooling with PgBouncer
-- Set up streaming replication for read replicas
-- Use pg_trgm extension for fuzzy search
-
-## Related Decisions
-
-- ADR-0002: Caching Strategy (Redis) - complements database choice
-- ADR-0005: Search Architecture - may supersede if Elasticsearch needed
-
-## References
-
-- [PostgreSQL JSON Documentation](https://www.postgresql.org/docs/current/datatype-json.html)
-- [PostgreSQL Full Text Search](https://www.postgresql.org/docs/current/textsearch.html)
-- Internal: Performance benchmarks in `/docs/benchmarks/database-comparison.md`
-```
-
-### Template 2: Lightweight ADR
-
-```markdown
-# ADR-0012: Adopt TypeScript for Frontend Development
-
-**Status**: Accepted
-**Date**: 2024-01-15
-**Deciders**: @alice, @bob, @charlie
-
-## Context
-
-Our React codebase has grown to 50+ components with increasing bug reports
-related to prop type mismatches and undefined errors. PropTypes provide
-runtime-only checking.
-
-## Decision
-
-Adopt TypeScript for all new frontend code. Migrate existing code incrementally.
-
-## Consequences
-
-**Good**: Catch type errors at compile time, better IDE support, self-documenting
-code.
-
-**Bad**: Learning curve for team, initial slowdown, build complexity increase.
-
-**Mitigations**: TypeScript training sessions, allow gradual adoption with
-`allowJs: true`.
-```
-
-### Template 3: Y-Statement Format
-
-```markdown
-# ADR-0015: API Gateway Selection
-
-In the context of **building a microservices architecture**,
-facing **the need for centralized API management, authentication, and rate limiting**,
-we decided for **Kong Gateway**
-and against **AWS API Gateway and custom Nginx solution**,
-to achieve **vendor independence, plugin extensibility, and team familiarity with Lua**,
-accepting that **we need to manage Kong infrastructure ourselves**.
-```
-
-### Template 4: ADR for Deprecation
-
-```markdown
-# ADR-0020: Deprecate MongoDB in Favor of PostgreSQL
-
-## Status
-
-Accepted (Supersedes ADR-0003)
-
-## Context
-
-ADR-0003 (2021) chose MongoDB for user profile storage due to schema flexibility
-needs. Since then:
-
-- MongoDB's multi-document transactions remain problematic for our use case
-- Our schema has stabilized and rarely changes
-- We now have PostgreSQL expertise from other services
-- Maintaining two databases increases operational burden
-
-## Decision
-
-Deprecate MongoDB and migrate user profiles to PostgreSQL.
-
-## Migration Plan
-
-1. **Phase 1** (Week 1-2): Create PostgreSQL schema, dual-write enabled
-2. **Phase 2** (Week 3-4): Backfill historical data, validate consistency
-3. **Phase 3** (Week 5): Switch reads to PostgreSQL, monitor
-4. **Phase 4** (Week 6): Remove MongoDB writes, decommission
-
-## Consequences
-
-### Positive
-
-- Single database technology reduces operational complexity
-- ACID transactions for user data
-- Team can focus PostgreSQL expertise
-
-### Negative
-
-- Migration effort (~4 weeks)
-- Risk of data issues during migration
-- Lose some schema flexibility
-
-## Lessons Learned
-
-Document from ADR-0003 experience:
-
-- Schema flexibility benefits were overestimated
-- Operational cost of multiple databases was underestimated
-- Consider long-term maintenance in technology decisions
-```
-
-### Template 5: Request for Comments (RFC) Style
-
-```markdown
-# RFC-0025: Adopt Event Sourcing for Order Management
-
-## Summary
-
-Propose adopting event sourcing pattern for the order management domain to
-improve auditability, enable temporal queries, and support business analytics.
-
-## Motivation
-
-Current challenges:
-
-1. Audit requirements need complete order history
-2. "What was the order state at time X?" queries are impossible
-3. Analytics team needs event stream for real-time dashboards
-4. Order state reconstruction for customer support is manual
-
-## Detailed Design
-
-### Event Store
-
-```
-OrderCreated { orderId, customerId, items[], timestamp }
-OrderItemAdded { orderId, item, timestamp }
-OrderItemRemoved { orderId, itemId, timestamp }
-PaymentReceived { orderId, amount, paymentId, timestamp }
-OrderShipped { orderId, trackingNumber, timestamp }
-```
-
-### Projections
-
-- **CurrentOrderState**: Materialized view for queries
-- **OrderHistory**: Complete timeline for audit
-- **DailyOrderMetrics**: Analytics aggregation
-
-### Technology
-
-- Event Store: EventStoreDB (purpose-built, handles projections)
-- Alternative considered: Kafka + custom projection service
-
-## Drawbacks
-
-- Learning curve for team
-- Increased complexity vs. CRUD
-- Need to design events carefully (immutable once stored)
-- Storage growth (events never deleted)
-
-## Alternatives
-
-1. **Audit tables**: Simpler but doesn't enable temporal queries
-2. **CDC from existing DB**: Complex, doesn't change data model
-3. **Hybrid**: Event source only for order state changes
-
-## Unresolved Questions
-
-- [ ] Event schema versioning strategy
-- [ ] Retention policy for events
-- [ ] Snapshot frequency for performance
-
-## Implementation Plan
-
-1. Prototype with single order type (2 weeks)
-2. Team training on event sourcing (1 week)
-3. Full implementation and migration (4 weeks)
-4. Monitoring and optimization (ongoing)
-
-## References
-
-- [Event Sourcing by Martin Fowler](https://martinfowler.com/eaaDev/EventSourcing.html)
-- [EventStoreDB Documentation](https://www.eventstore.com/docs)
-```
-
-## ADR Management
-
-### Directory Structure
-
-```
-docs/
-├── adr/
-│   ├── README.md           # Index and guidelines
-│   ├── template.md         # Team's ADR template
-│   ├── 0001-use-postgresql.md
-│   ├── 0002-caching-strategy.md
-│   ├── 0003-mongodb-user-profiles.md  # [DEPRECATED]
-│   └── 0020-deprecate-mongodb.md      # Supersedes 0003
-```
-
-### ADR Index (README.md)
-
-```markdown
-# Architecture Decision Records
-
-This directory contains Architecture Decision Records (ADRs) for [Project Name].
-
-## Index
-
-| ADR                                   | Title                              | Status     | Date       |
-| ------------------------------------- | ---------------------------------- | ---------- | ---------- |
-| [0001](0001-use-postgresql.md)        | Use PostgreSQL as Primary Database | Accepted   | 2024-01-10 |
-| [0002](0002-caching-strategy.md)      | Caching Strategy with Redis        | Accepted   | 2024-01-12 |
-| [0003](0003-mongodb-user-profiles.md) | MongoDB for User Profiles          | Deprecated | 2023-06-15 |
-| [0020](0020-deprecate-mongodb.md)     | Deprecate MongoDB                  | Accepted   | 2024-01-15 |
-
-## Creating a New ADR
-
-1. Copy `template.md` to `NNNN-title-with-dashes.md`
-2. Fill in the template
-3. Submit PR for review
-4. Update this index after approval
-
-## ADR Status
-
-- **Proposed**: Under discussion
-- **Accepted**: Decision made, implementing
-- **Deprecated**: No longer relevant
-- **Superseded**: Replaced by another ADR
-- **Rejected**: Considered but not adopted
-```
-
-### Automation (adr-tools)
-
-```bash
-# Install adr-tools
-brew install adr-tools
-
-# Initialize ADR directory
-adr init docs/adr
-
-# Create new ADR
-adr new "Use PostgreSQL as Primary Database"
-
-# Supersede an ADR
-adr new -s 3 "Deprecate MongoDB in Favor of PostgreSQL"
-
-# Generate table of contents
-adr generate toc > docs/adr/README.md
-
-# Link related ADRs
-adr link 2 "Complements" 1 "Is complemented by"
-```
-
-## Review Process
-
-```markdown
-## ADR Review Checklist
-
-### Before Submission
-
-- [ ] Context clearly explains the problem
-- [ ] All viable options considered
-- [ ] Pros/cons balanced and honest
-- [ ] Consequences (positive and negative) documented
-- [ ] Related ADRs linked
-
-### During Review
-
-- [ ] At least 2 senior engineers reviewed
-- [ ] Affected teams consulted
-- [ ] Security implications considered
-- [ ] Cost implications documented
-- [ ] Reversibility assessed
-
-### After Acceptance
-
-- [ ] ADR index updated
-- [ ] Team notified
-- [ ] Implementation tickets created
-- [ ] Related documentation updated
-```
-
-## Best Practices
-
-### Do's
-
-- **Write ADRs early** - Before implementation starts
-- **Keep them short** - 1-2 pages maximum
-- **Be honest about trade-offs** - Include real cons
-- **Link related decisions** - Build decision graph
-- **Update status** - Deprecate when superseded
-
-### Don'ts
-
-- **Don't change accepted ADRs** - Write new ones to supersede
-- **Don't skip context** - Future readers need background
-- **Don't hide failures** - Rejected decisions are valuable
-- **Don't be vague** - Specific decisions, specific consequences
-- **Don't forget implementation** - ADR without action is waste
-
----
-
-## Attribution
-
-Forked from [wshobson/agents](https://github.com/wshobson/agents) — MIT licensed.
-See `LICENSE` in this directory for the original copyright and terms.
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/arch-design/SKILL.md b/arch-design/SKILL.md
deleted file mode 100644
index 744b8d0..0000000
--- a/arch-design/SKILL.md
+++ /dev/null
@@ -1,249 +0,0 @@
----
-name: arch-design
-description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
----
-
-# Architecture Design
-
-Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision.
-
-## When to Use This Skill
-
-- Starting a new project, service, or major subsystem
-- Restructuring or re-platforming an existing system
-- Selecting a primary tech stack for a green-field effort
-- Establishing a formal architecture before a team scales
-- Preparing a spike or prototype you want to keep coherent
-
-## When NOT to Use This Skill
-
-- Bug fixing or defect investigation
-- Small feature additions inside an existing architecture
-- Recording a single decision (use `arch-decide`)
-- Producing formal documentation from a known architecture (use `arch-document`)
-- Auditing an existing codebase against its stated architecture (use `arch-evaluate`)
-
-## Workflow
-
-The skill runs in five phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake.
-
-1. **Intake** — elicit stakeholders, domain, scale, timeline, team
-2. **Quality attributes** — prioritize (can't have everything)
-3. **Constraints** — technical, organizational, legal, cost
-4. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis
-5. **Recommendation** — pick one, justify it, flag open questions for ADRs
-
-## Phase 1 — Intake
-
-Ask the user to answer each. Short answers are fine; vague answers mean return to the question.
-
-**System identity**
-- What is this system? (One sentence.)
-- Who uses it? Human end users, other services, both?
-- What domain is it in? (commerce, health, comms, finance, ops, etc.)
-
-**Scale**
-- Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes)
-- Data volume at launch and in 12 months? (rows/docs, GB)
-- Geographic distribution? (single region, multi-region, edge)
-
-**Team**
-- Team size now? Growing?
-- Existing language/stack expertise?
-- Operational maturity? (on-call rotation, observability tooling, CI/CD)
-
-**Timeline**
-- When does it need to be in production?
-- Is this replacing something? What's the migration path?
-
-**Change drivers**
-- What forces are likely to reshape this system? (new markets, regulatory, volume, organizational)
-
-## Phase 2 — Quality Attributes
-
-List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities.
-
-Standard list (use this set; add domain-specific ones if relevant):
-
-- **Performance** (latency, throughput under load)
-- **Scalability** (horizontal, vertical, elastic)
-- **Availability** (uptime target, failure tolerance)
-- **Reliability** (data durability, correctness under partial failure)
-- **Security** (authn/z, data protection, threat model)
-- **Maintainability** (readability, testability, onboarding speed)
-- **Observability** (logs, metrics, tracing, debuggability)
-- **Deployability** (release cadence, rollback speed)
-- **Cost** (infra, operational, total cost of ownership)
-- **Compliance** (regulatory, audit, data residency)
-- **Interoperability** (integration with existing systems, APIs, standards)
-- **Flexibility / evolvability** (ease of adding features, changing direction)
-
-Document the ranking verbatim in the brief. Future decisions hinge on it.
-
-## Phase 3 — Constraints
-
-Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide.
-
-- **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move
-- **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan
-- **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention
-- **Cost**: budget ceiling, licensing limits, infra cost targets
-- **Timeline**: hard dates from business, regulatory deadlines
-- **Compatibility**: backward-compatibility with existing clients, API contracts, data formats
-
-## Phase 4 — Candidate Paradigms
-
-Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed.
-
-Common paradigms to consider:
-
-| Paradigm | Fits when… | Doesn't fit when… |
-|---|---|---|
-| **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements |
-| **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity |
-| **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate |
-| **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit |
-| **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity |
-| **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified |
-| **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive |
-| **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control |
-| **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency |
-| **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale |
-| **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system |
-
-For each candidate, document:
-
-- **Summary** — one paragraph on what the architecture would look like
-- **Why it fits** — map to the ranked quality attributes
-- **Why it might not** — the honest cons for this specific context
-- **Cost** — team learning curve, operational overhead, infra impact
-- **Open questions** — what would need to be answered or decided via ADR
-
-## Phase 5 — Recommendation
-
-Choose one paradigm. Justify it by:
-
-- Which top-3 quality attributes the choice serves
-- Which constraints it respects
-- Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y)
-- What you're trading away (be explicit)
-
-Flag items that need their own ADRs — use `arch-decide` to record them. Examples:
-
-- Primary database choice
-- Message bus vs. direct calls
-- Sync vs. async inter-service comms
-- Multi-tenancy approach
-- Authentication/authorization boundary
-
-## Output: `.architecture/brief.md`
-
-Write the final brief to `.architecture/brief.md` in the project root. Use this structure:
-
-```markdown
-# Architecture Brief — <Project Name>
-
-**Date:** <YYYY-MM-DD>
-**Authors:** <names>
-**Status:** Draft | Accepted | Revised
-
-## 1. System
-
-<One-paragraph description of what the system does.>
-
-## 2. Stakeholders and Users
-
-- **Primary users:** …
-- **Secondary users:** …
-- **Operators:** …
-- **Integrators / dependent systems:** …
-
-## 3. Scale Targets
-
-| Metric | Launch | +12 months |
-|---|---|---|
-| RPS | … | … |
-| Data volume | … | … |
-| Geographic spread | … | … |
-
-## 4. Quality Attributes (Prioritized)
-
-| Rank | Attribute | Notes |
-|---|---|---|
-| 1 | … | critical driver |
-| 2 | … | … |
-
-## 5. Constraints
-
-- **Technical:** …
-- **Organizational:** …
-- **Legal/Regulatory:** …
-- **Cost:** …
-- **Timeline:** …
-
-## 6. Candidates Considered
-
-### Candidate A — <paradigm>
-
-Summary. Fit. Cons. Cost. Open questions.
-
-### Candidate B — <paradigm>
-
-…
-
-## 7. Recommendation
-
-**Chosen paradigm:** …
-
-**Rationale:**
-- …
-
-**Trade-offs accepted:**
-- …
-
-**Rejected alternatives and why:**
-- …
-
-## 8. Open Decisions (Candidates for ADRs)
-
-- [ ] Primary database — driver: …
-- [ ] …
-
-## 9. Next Steps
-
-- Run `arch-decide` for each open decision above
-- Run `arch-document` to produce the full arc42 document
-- Run `arch-evaluate` once implementation begins to audit conformance
-```
-
-## Review Checklist
-
-Before marking the brief Accepted:
-
-- [ ] Every quality attribute is ranked; no "all critical"
-- [ ] Every constraint is explicit; no "we probably can't"
-- [ ] At least 2 candidates considered; each has real cons
-- [ ] Recommendation names what it's trading away
-- [ ] Open decisions listed; each will become an ADR via `arch-decide`
-- [ ] Stakeholders have seen and (ideally) approved
-
-## Anti-Patterns
-
-- **"All quality attributes are critical."** They aren't. Force ranking.
-- **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting.
-- **Single candidate.** One option means the user already decided; you're documenting, not designing.
-- **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so.
-- **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard.
-- **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation.
-
-## Hand-Off
-
-After the brief is Accepted:
-
-- `arch-decide` — record each Open Decision as an ADR under `docs/adr/`
-- `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams
-- `arch-evaluate` — once code exists, audit it against this brief
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/arch-document/SKILL.md b/arch-document/SKILL.md
deleted file mode 100644
index 24a3720..0000000
--- a/arch-document/SKILL.md
+++ /dev/null
@@ -1,317 +0,0 @@
----
-name: arch-document
-description: Produce a complete arc42-structured architecture document from a project's architecture brief and ADRs. Generates all twelve arc42 sections (Introduction & Goals, Constraints, Context & Scope, Solution Strategy, Building Block View, Runtime View, Deployment View, Crosscutting Concepts, Architecture Decisions, Quality Requirements, Risks & Technical Debt, Glossary). Dispatches to the c4-analyze and c4-diagram skills for building-block, container, and context diagrams. Outputs one file per section under `docs/architecture/`. Use when formalizing an architecture that already has a brief + ADRs, preparing documentation for a review, onboarding new engineers, or satisfying a compliance requirement. Do NOT use for shaping a new architecture (use arch-design), recording individual decisions (use arch-decide), auditing code against an architecture (use arch-evaluate), or for simple systems where a brief alone suffices. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
----
-
-# Architecture Documentation (arc42)
-
-Turn an architecture brief and a set of ADRs into a full arc42-structured document. The result is the authoritative reference for the system — engineers, auditors, and new hires read this to understand what exists and why.
-
-## When to Use This Skill
-
-- A project has completed `arch-design` and has a `.architecture/brief.md`
-- Open decisions from the brief have been answered via `arch-decide` (ADRs)
-- The team needs documentation suitable for review, onboarding, or compliance
-- An existing system needs retroactive architecture docs
-
-## When NOT to Use This Skill
-
-- Before a brief exists (run `arch-design` first)
-- For small systems where a brief alone is sufficient (don't over-document)
-- To record a single decision (use `arch-decide`)
-- To audit code (use `arch-evaluate`)
-- When the team doesn't need arc42 — a lighter structure may serve better
-
-## Inputs
-
-Expected files, in order of preference:
-
-1. `.architecture/brief.md` — output of `arch-design`
-2. `docs/adr/*.md` — ADRs from `arch-decide`
-3. Codebase (for inferring module structure, dispatched to `c4-analyze`)
-4. Existing `docs/architecture/` contents (if updating rather than creating)
-
-If `.architecture/brief.md` is absent, stop and tell the user to run `arch-design` first. Do not fabricate the brief.
-
-## Workflow
-
-1. **Load and validate inputs** — brief present? ADRs found? Any existing arc42 docs?
-2. **Plan section coverage** — which of the twelve sections need content from the brief, ADRs, code, or user?
-3. **Generate each section** — draft from available inputs; explicitly mark gaps
-4. **Dispatch to C4 skills** — for sections needing diagrams (Context & Scope, Building Block View, Runtime View, Deployment View)
-5. **Cross-link** — ensure every ADR is referenced; every diagram has a narrative
-6. **Output** — one file per section under `docs/architecture/`, plus an index
-
-## The Twelve arc42 Sections
-
-For each: what it's for, what goes in it, where the content comes from, and what to dispatch.
-
-### 1. Introduction and Goals → `01-introduction.md`
-
-**Purpose:** Why does this system exist? What are the top business goals?
-
-**Content:**
-- One-paragraph system description (from brief §1)
-- Top 3-5 business goals (from brief §1-2)
-- Top 3 stakeholders and their concerns (from brief §2)
-- Top 3 quality goals (from brief §4, pulling the top-ranked attributes)
-
-**Sources:** brief §1, §2, §4.
-
-### 2. Architecture Constraints → `02-constraints.md`
-
-**Purpose:** What's non-negotiable? What narrows the solution space?
-
-**Content:**
-- Technical constraints (stack mandates, integration points)
-- Organizational constraints (team, expertise, timeline)
-- Conventions (style guides, naming, review process)
-- Legal/regulatory/compliance
-
-**Sources:** brief §5. Expand with any inherited constraints not in the brief.
-
-### 3. Context and Scope → `03-context.md`
-
-**Purpose:** What's inside the system? What's outside? What are the boundaries?
-
-**Content:**
-- **Business context:** external users, upstream/downstream systems, human actors. One diagram.
-- **Technical context:** protocols, data formats, deployment boundaries. One diagram or table.
-
-**Dispatch:** call the `c4-diagram` skill with a C4 System Context diagram request. Pass the brief's §1-2 (system identity + stakeholders) + integration points from §5 constraints.
-
-### 4. Solution Strategy → `04-solution-strategy.md`
-
-**Purpose:** The fundamental decisions that shape everything else.
-
-**Content:**
-- Primary architectural paradigm (from brief §7, the recommendation)
-- Top-level technology decisions (link to ADRs)
-- Decomposition strategy (how the system breaks into modules/services)
-- Approach to the top 3 quality goals (one sentence each: how does the strategy achieve performance / availability / security / whatever ranked highest)
-
-**Sources:** brief §7, ADRs.
-
-### 5. Building Block View → `05-building-blocks.md`
-
-**Purpose:** Static decomposition. Layer-by-layer breakdown.
-
-**Content:**
-- **Level 1** (whiteboard view): the handful of major components
-- **Level 2** (for significant components): internal structure
-- For each component: responsibility, key interfaces, dependencies, quality properties
-
-**Dispatch:** call `c4-analyze` on the codebase if code exists; it produces Container and Component diagrams. Embed those diagrams + narrative here. If no code exists, call `c4-diagram` with the decomposition from brief §7.
-
-### 6. Runtime View → `06-runtime.md`
-
-**Purpose:** Dynamic behavior. The interesting interactions.
-
-**Content:** 3-5 scenarios that matter (not every flow — the ones that illustrate key architecture). Each scenario:
-- What triggers it
-- Which components participate
-- Data flow
-- Error handling / failure modes
-
-**Dispatch:** sequence diagrams via `c4-diagram`. One per scenario.
-
-### 7. Deployment View → `07-deployment.md`
-
-**Purpose:** Where does the system run? How is it packaged?
-
-**Content:**
-- Deployment environments (dev, staging, prod; regions; edge)
-- Infrastructure topology (VMs, containers, serverless, managed services)
-- Allocation of building blocks to infrastructure
-- Network boundaries, data flow across them
-
-**Dispatch:** deployment diagram via `c4-diagram`.
-
-### 8. Crosscutting Concepts → `08-crosscutting.md`
-
-**Purpose:** Concerns that span the system — not owned by one module.
-
-**Content:** one subsection per concept. Typical concepts:
-- Security (authn/z, data protection, secrets management)
-- Error handling (retries, circuit breakers, dead-letter queues)
-- Logging, metrics, tracing
-- Configuration and feature flags
-- Persistence patterns (ORM? repository? direct SQL?)
-- Concurrency model
-- Caching strategy
-- Internationalization
-- Testability approach
-
-Only include concepts that are actually crosscutting in *this* system. If error handling is owned by one service, it's not crosscutting.
-
-### 9. Architecture Decisions → `09-decisions.md`
-
-**Purpose:** Index of the significant decisions. Not the ADRs themselves — a curated summary.
-
-**Content:** a table linking out to every ADR in `docs/adr/`:
-
-| ADR | Decision | Status | Date |
-|---|---|---|---|
-| [0001](../adr/0001-database.md) | Primary database: PostgreSQL | Accepted | 2024-01-10 |
-
-Plus a one-sentence summary of *why this set of decisions, not others*. The meta-rationale.
-
-**Sources:** `docs/adr/*.md`. Auto-generate the table from the filesystem; update on each run.
-
-### 10. Quality Requirements → `10-quality.md`
-
-**Purpose:** Measurable quality targets. Not "the system should be fast" — specific scenarios.
-
-**Content:**
-- **Quality tree** — the ranked quality attributes from brief §4, each with refinements
-- **Quality scenarios** — specific, testable. Format: "Under [condition], the system should [response] within [measure]."
-
-Example:
-
-> Under peak traffic (10,000 concurrent users), a cart checkout should complete in under 2 seconds at the 95th percentile.
-
-Minimum 1 scenario per top-3 quality attribute.
-
-**Sources:** brief §4, §7 (rationale).
-
-### 11. Risks and Technical Debt → `11-risks.md`
-
-**Purpose:** Known risks and liabilities. A snapshot of what could go wrong.
-
-**Content:**
-
-| Risk / Debt | Impact | Likelihood | Mitigation / Plan |
-|---|---|---|---|
-| Single DB becomes bottleneck at 50k RPS | High | Medium (12mo) | Add read replicas; monitor query latency |
-
-Plus a short narrative on known technical debt, what caused it, and when it'll be addressed.
-
-**Sources:** brief §7 (trade-offs accepted), team knowledge. Refresh regularly.
-
-### 12. Glossary → `12-glossary.md`
-
-**Purpose:** Shared vocabulary. Terms, acronyms, and domain-specific words with agreed definitions.
-
-**Content:** table, alphabetical.
-
-| Term | Definition |
-|---|---|
-| Cart | A customer's in-progress selection of items before checkout. |
-| Order | A confirmed, paid transaction resulting from a checked-out cart. |
-
-Keep disciplined: when two people use the same word differently, the glossary is where the disagreement surfaces.
-
-## Output Layout
-
-```
-docs/architecture/
-├── README.md                  # Index; generated from the 12 sections
-├── 01-introduction.md
-├── 02-constraints.md
-├── 03-context.md
-├── 04-solution-strategy.md
-├── 05-building-blocks.md
-├── 06-runtime.md
-├── 07-deployment.md
-├── 08-crosscutting.md
-├── 09-decisions.md
-├── 10-quality.md
-├── 11-risks.md
-├── 12-glossary.md
-└── diagrams/                  # C4 outputs land here
-    ├── context.svg
-    ├── container.svg
-    ├── component-<module>.svg
-    └── runtime-<scenario>.svg
-```
-
-### README.md (auto-generated index)
-
-```markdown
-# Architecture Documentation — <Project Name>
-
-arc42-structured documentation. Read top to bottom for context; skip to a
-specific section via the index.
-
-## Sections
-
-1. [Introduction and Goals](01-introduction.md)
-2. [Architecture Constraints](02-constraints.md)
-3. [Context and Scope](03-context.md)
-4. [Solution Strategy](04-solution-strategy.md)
-5. [Building Block View](05-building-blocks.md)
-6. [Runtime View](06-runtime.md)
-7. [Deployment View](07-deployment.md)
-8. [Crosscutting Concepts](08-crosscutting.md)
-9. [Architecture Decisions](09-decisions.md)
-10. [Quality Requirements](10-quality.md)
-11. [Risks and Technical Debt](11-risks.md)
-12. [Glossary](12-glossary.md)
-
-## Source Documents
-
-- Brief: [`.architecture/brief.md`](../../.architecture/brief.md)
-- Decisions: [`../adr/`](../adr/)
-
-## Last Updated
-
-<YYYY-MM-DD> — regenerate by running `arch-document` after brief or ADR changes.
-```
-
-## Dispatch to C4 Skills
-
-Several sections need diagrams. The `arch-document` skill does not generate diagrams directly; it dispatches:
-
-- **Section 3 (Context and Scope)** → `c4-diagram` for a System Context diagram. Input: system name, external actors, external systems.
-- **Section 5 (Building Block View)** → `c4-analyze` if the codebase exists (it infers Container + Component diagrams). Otherwise `c4-diagram` with decomposition from brief.
-- **Section 6 (Runtime View)** → `c4-diagram` for sequence diagrams, one per scenario.
-- **Section 7 (Deployment View)** → `c4-diagram` for a deployment diagram.
-
-When dispatching, include: the relevant section narrative + the specific diagram type + any identifiers needed. Capture outputs in `docs/architecture/diagrams/` and embed via relative markdown image links.
-
-## Gap Handling
-
-If information is missing for a section:
-
-- **Brief gap:** stop and ask the user, or mark the section `TODO: resolve with arch-design`
-- **ADR gap:** stub the section with a `TODO: run arch-decide for <decision>` and proceed
-- **Code gap:** if dispatching to c4-analyze fails (no code yet), fall back to c4-diagram with brief-derived decomposition
-
-Never fabricate content. A `TODO` is honest; a plausible-sounding but made-up detail is misleading.
-
-## Review Checklist
-
-Before marking the documentation complete:
-
-- [ ] Every section has either real content or an explicit TODO
-- [ ] Every ADR is referenced in section 9
-- [ ] Diagrams are present for sections 3, 5, 6, 7
-- [ ] Quality scenarios (§10) are specific and measurable
-- [ ] Risks (§11) include likelihood and mitigation
-- [ ] Glossary (§12) captures domain terms, not just jargon
-- [ ] README index lists all sections with correct relative links
-
-## Anti-Patterns
-
-- **Template filling without thought.** Copying section headings but making up the content. If you don't have the data, mark TODO.
-- **Diagrams without narrative.** A diagram in isolation tells half the story. Each diagram needs a paragraph saying what to notice.
-- **Every concept as crosscutting.** If security is owned by one service, it's not crosscutting — put it in that service's building-block entry.
-- **Uncurated ADR dump.** Section 9 should be a curated index with the *reason* these decisions hang together, not just a directory listing.
-- **Vague quality requirements.** "The system should be fast" is useless. Specify scenarios with measurable bounds.
-- **Risk-free architecture.** Every real system has risks. A blank risks section means you didn't look.
-
-## Maintenance
-
-arc42 is a living document. Re-run `arch-document` when:
-
-- A new ADR is added → section 9 refreshes
-- The brief is revised → sections 1, 4, 10 may change
-- The codebase restructures → section 5 (via re-running c4-analyze)
-- A new deployment target is added → section 7
-- A scenario's behavior or SLO changes → section 6 or 10
-
-Commit each regeneration. The document's Git history is part of the architecture record.
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/arch-evaluate/SKILL.md b/arch-evaluate/SKILL.md
deleted file mode 100644
index f1c3391..0000000
--- a/arch-evaluate/SKILL.md
+++ /dev/null
@@ -1,332 +0,0 @@
----
-name: arch-evaluate
-description: Audit an existing codebase against its stated architecture brief and ADRs. Runs framework-agnostic checks (cyclic dependencies, stated-layer violations, public API drift) that work on any language without setup, and opportunistically invokes language-specific linters (dependency-cruiser for TypeScript, import-linter for Python, go vet + depguard for Go) when they're already configured in the repo — augmenting findings, never replacing. Produces a report with severity levels (error / warning / info) and pointers to the relevant brief section or ADR for each violation. Use when auditing conformance before a release, during code review, when an architecture is suspected to have drifted, or as a pre-merge CI gate. Do NOT use for designing an architecture (use arch-design), recording decisions (use arch-decide), or producing documentation (use arch-document). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
----
-
-# Architecture Evaluation
-
-Audit an implementation against its stated architecture. Framework-agnostic by default; augmented by language-specific linters when they're configured in the repo. Reports violations with severity and the rule they break.
-
-## When to Use This Skill
-
-- Auditing a codebase before a major release
-- During code review of a structurally significant change
-- When architecture is suspected to have drifted from its documented intent
-- As a pre-merge check (output can feed CI)
-- Before an architecture review meeting
-
-## When NOT to Use This Skill
-
-- No brief or ADRs exist (run `arch-design` and `arch-decide` first)
-- Shaping a new architecture (use `arch-design`)
-- Recording a single decision (use `arch-decide`)
-- Generating documentation (use `arch-document`)
-- Deep code-quality review (this skill checks *structural* conformance, not line-level quality — use a code review skill for that)
-
-## Inputs
-
-1. `.architecture/brief.md` — the source of truth for paradigm, layers, quality attributes
-2. `docs/adr/*.md` — specific decisions that the code must honor
-3. `docs/architecture/*.md` — if present, used for additional structural context
-4. The codebase itself
-
-If the brief is missing, stop and tell the user to run `arch-design` first. Do not guess at intent.
-
-## Workflow
-
-1. **Load the brief and ADRs.** Extract: declared paradigm, layers (if any), forbidden dependencies, module boundaries, API contracts that matter.
-2. **Detect repo languages and linters.** Inspect `package.json`, `pyproject.toml`, `go.mod`, `pom.xml`, etc.
-3. **Run framework-agnostic checks.** Always. These never need tooling.
-4. **Run language-specific tools if configured.** Opportunistic — only if the repo already has `.dependency-cruiser.cjs`, `.importlinter`, `.golangci.yml` with import rules, etc. Never install tooling.
-5. **Combine findings.** Deduplicate across sources. Label each finding with provenance (native / tool).
-6. **Produce report.** Severity-sorted markdown at `.architecture/evaluation-<date>.md`.
-
-## Framework-Agnostic Checks
-
-These work on any language. Claude reads the code and applies the policy from the brief.
-
-### 1. Cyclic Dependencies
-
-Scan imports/requires/includes across the codebase. Build the module graph. Report any cycles.
-
-**Severity:** Error (cycles are almost always architecture bugs).
-
-**Scale limit:** for codebases over ~100k lines, this check is noisy — prefer the language-specific tool output (much faster and complete).
-
-### 2. Stated-Layer Violations
-
-If the brief declares layers (e.g., `presentation → application → domain → infrastructure`), scan imports for arrows that go the wrong way.
-
-**Policy format in the brief:**
-
-```
-Layers (outer → inner):
-  presentation → application → domain
-  presentation → application → infrastructure
-  application → infrastructure (repositories only)
-  domain → (none)
-```
-
-**Check:** each import statement's target must be reachable from the source via the declared arrows. Flag any that isn't.
-
-**Severity:** Error when the violation crosses a top-level layer. Warning when it's a within-layer oddity.
-
-### 3. Public API Drift
-
-If the brief or an ADR documents the public interface of a module/package (exported types, functions, endpoints), compare the declared interface to what the code actually exports.
-
-**Sources for expected interface:**
-
-- ADRs with code-block signatures
-- Brief §7 or Candidate description
-- Any `docs/architecture/05-building-blocks.md` section
-
-**Check:** listed public names exist; no additional exports are marked public unless recorded.
-
-**Severity:** Warning (intended additions may just lack an ADR yet).
-
-### 4. Open Decision vs Implementation
-
-For each item in brief §8 (Open Decisions): has it been decided via an ADR? Is the implementation consistent with that ADR?
-
-**Severity:** Warning (drift here usually means someone made a decision without recording it).
-
-### 5. Forbidden Dependencies
-
-The brief may list forbidden imports explicitly. Example: "Domain module must not import from framework packages (Django, FastAPI, etc.)." Check.
-
-**Severity:** Error.
-
-## Language-Specific Tools (Opportunistic)
-
-These run only if the user's repo has a config file already present. If not configured, skip silently — the framework-agnostic checks still run.
-
-### TypeScript — dependency-cruiser
-
-**Detected when:** `.dependency-cruiser.cjs`, `.dependency-cruiser.js`, or `dependency-cruiser` config in `package.json`.
-
-**Invocation:**
-
-```bash
-npx dependency-cruiser --validate .dependency-cruiser.cjs src/
-```
-
-**Parse:** JSON output (`--output-type json`). Each violation becomes a finding with severity mapped from the rule's `severity`.
-
-### Python — import-linter
-
-**Detected when:** `.importlinter`, `importlinter.ini`, or `[tool.importlinter]` in `pyproject.toml`.
-
-**Invocation:**
-
-```bash
-lint-imports
-```
-
-**Parse:** exit code + text output. Contract failures are errors; contract warnings are warnings.
-
-### Python — grimp (supplementary)
-
-If `import-linter` isn't configured but Python code is present, a lightweight check using `grimp` can still detect cycles:
-
-```bash
-python -c "import grimp; g = grimp.build_graph('your_package'); print(g.find_shortest_chains('a.module', 'b.module'))"
-```
-
-Skip unless the user enables it.
-
-### Go — go vet + depguard
-
-**Detected when:** `.golangci.yml` or `.golangci.yaml` contains a `depguard` linter entry.
-
-**Invocation:**
-
-```bash
-go vet ./...
-golangci-lint run --disable-all --enable=depguard ./...
-```
-
-**Parse:** standard Go-style output. Depguard violations mapped to layer rules in the brief.
-
-### Future Languages
-
-For Java, C, C++ — tooling exists (ArchUnit, include-what-you-use, cpp-linter) but isn't integrated in v1. Framework-agnostic checks still apply.
-
-## Tool Install Commands (for reference)
-
-Included so the skill can tell the user what to install if they want the tool-augmented checks. The skill never installs; the user does.
-
-### Python
-
-```bash
-pipx install import-linter     # or: pip install --user import-linter
-pipx install grimp             # optional, supplementary
-```
-
-### TypeScript / JavaScript
-
-```bash
-npm install --save-dev dependency-cruiser
-```
-
-### Go
-
-```bash
-# golangci-lint includes depguard
-brew install golangci-lint     # macOS
-# or
-go install github.com/golangci/golangci-lint/cmd/golangci-lint@latest
-```
-
-### Java (future v2+)
-
-```xml
-<!-- pom.xml -->
-<dependency>
-  <groupId>com.tngtech.archunit</groupId>
-  <artifactId>archunit-junit5</artifactId>
-  <version>1.3.0</version>
-  <scope>test</scope>
-</dependency>
-```
-
-### C / C++ (future v2+)
-
-```bash
-brew install include-what-you-use
-# or on Linux: package manager, or build from source
-```
-
-## Config Generation
-
-The skill does not auto-generate linter configs. That's v2+. For now, if the user wants tool-augmented checks, they configure the linter themselves, using the brief as a guide. Document the desired rules in the brief so humans (and future Claude sessions) can translate consistently.
-
-Example — mapping brief layers to `import-linter` contracts:
-
-```ini
-# .importlinter
-[importlinter]
-root_package = myapp
-
-[importlinter:contract:layers]
-name = Core layers
-type = layers
-layers =
-    myapp.presentation
-    myapp.application
-    myapp.domain
-    myapp.infrastructure
-```
-
-## Output: `.architecture/evaluation-<date>.md`
-
-Write the report to `.architecture/evaluation-<YYYY-MM-DD>.md`. Use this structure:
-
-```markdown
-# Architecture Evaluation — <Project Name>
-
-**Date:** <YYYY-MM-DD>
-**Brief version:** <commit hash or date of `.architecture/brief.md`>
-**Checks run:** framework-agnostic + <detected tools>
-
-## Summary
-
-- **Errors:** <N>
-- **Warnings:** <N>
-- **Info:** <N>
-
-## Findings
-
-### Errors
-
-#### E1. Cyclic dependency: domain/user ↔ domain/order
-
-- **Source:** framework-agnostic
-- **Files:** `src/domain/user.py:14`, `src/domain/order.py:7`
-- **Rule:** Brief §7 — "Domain modules must not form cycles."
-- **Fix:** extract shared abstraction into a new module, or break the cycle by inverting one direction.
-
-#### E2. Forbidden dependency: domain imports Django
-
-- **Source:** import-linter (contract: framework_isolation)
-- **Files:** `myapp/domain/user.py:3`
-- **Rule:** Brief §5 — "Domain layer must not depend on framework packages."
-- **Related ADR:** [ADR-0004 Framework isolation](../docs/adr/0004-framework-isolation.md)
-- **Fix:** move the Django-specific logic to `myapp/infrastructure`.
-
-### Warnings
-
-#### W1. Public API drift: `OrderService.cancel()` added without ADR
-
-- **Source:** framework-agnostic
-- **File:** `src/domain/order.py:142`
-- **Rule:** Brief §8 — "Public API additions require an ADR."
-- **Fix:** run `arch-decide` to record the rationale, or make `cancel()` non-public.
-
-### Info
-
-#### I1. Open decision unresolved: message bus vs direct calls
-
-- **Source:** framework-agnostic
-- **Rule:** Brief §8 item 3.
-- **Fix:** run `arch-decide` to select and record.
-
-## Tool Output (raw)
-
-<Collapsed raw output from each tool that ran, for reference.>
-
-## Next Steps
-
-- Address all Errors before merge
-- Triage Warnings: either fix, record as ADR, or update the brief
-- Review Info items at the next architecture meeting
-```
-
-## Severity Mapping
-
-| Severity | Meaning | Action |
-|---|---|---|
-| **Error** | Structural violation of declared architecture. Code contradicts brief/ADR. | Block merge. Fix or update the brief (with an ADR). |
-| **Warning** | Deviation that may be intentional but wasn't recorded. | Discuss. Record via `arch-decide` or fix. |
-| **Info** | Open question or unresolved decision. Not a violation, but attention-worthy. | Triage. |
-
-## Review Checklist
-
-Before handing off the report:
-
-- [ ] All framework-agnostic checks ran
-- [ ] Detected linters ran if configured; skipped silently if not
-- [ ] Each finding has: severity, source (native or tool name), file/line, rule reference, suggested fix
-- [ ] Each finding links to the brief section or ADR that establishes the rule
-- [ ] Raw tool output preserved at the bottom for traceability
-- [ ] Report timestamped and commit-referenced
-
-## Anti-Patterns
-
-- **Silent failures.** A tool that errored out is a finding too — report it (Info: "dependency-cruiser failed to parse config; no TypeScript import checks ran").
-- **Swallowing open decisions.** An unresolved item in brief §8 is a legitimate warning. Don't treat it as "not a violation."
-- **Tool-only reports.** If the language-specific tool is configured and clean, don't skip the framework-agnostic checks — they cover things the tool doesn't (API drift, open decisions).
-- **Rule references missing.** Every finding must cite the rule. If you can't find the rule, the finding isn't actionable.
-- **Error inflation.** Reserve Error for real violations. Warnings exist to avoid crying wolf.
-
-## Integration with CI (future)
-
-v1 produces a markdown report. v2+ will add:
-
-- JSON output (machine-readable)
-- Exit-code behavior (non-zero when any Error)
-- `--fail-on-warnings` flag for strict mode
-- Delta mode (only report *new* findings since last evaluation)
-- Auto-generation of linter configs from the brief
-
-See `docs/architecture/v2-todo.org` in the rulesets repo for the full deferred list.
-
-## Hand-Off
-
-After the report:
-
-- **Errors** → fix or justify (each justification becomes an ADR via `arch-decide`)
-- **Warnings** → record ADRs for the deliberate ones; fix the rest
-- **Info** → resolve open decisions via `arch-decide`
-- After significant fixes, re-run `arch-evaluate` to verify
-- Consider re-running `arch-document` if the brief or ADRs changed as a result
diff --git a/brainstorm/SKILL.md b/brainstorm/SKILL.md
deleted file mode 100644
index b47523f..0000000
--- a/brainstorm/SKILL.md
+++ /dev/null
@@ -1,195 +0,0 @@
----
-name: brainstorm
-description: Refine a vague idea into a validated design through structured one-question-at-a-time dialogue, diverse option exploration (three conventional + three tail samples), and chunked validation (200-300 words at a time). Produces `docs/design/<topic>.md` as the output artifact. Use when shaping a new feature, service, or workflow before implementation begins — or when a "we should probably…" idea needs to become concrete enough to build. Do NOT use for mechanical well-defined work (renames, reformats, dependency bumps), for system-level architecture choices (use arch-design), for recording a single decision that has already been made (use arch-decide), or for debugging an existing error (use root-cause-trace or debug). Synthesized from the Agentic-Context-Engineering / SDD brainstorm pattern — probabilistic diversity sampling originated there.
----
-
-# Brainstorm
-
-Turn a rough idea into a design document concrete enough to implement. One question at a time, diverse options considered honestly, design validated in chunks.
-
-## When to Use
-
-- Shaping a new feature, component, service, or workflow before code exists
-- Translating "we should probably …" into something buildable
-- Converting a noticed-but-unshaped problem into a design
-- Exploring whether an idea is worth building before committing to it
-
-## When NOT to Use
-
-- Mechanical, well-defined work (rename, reformat, upgrade a dependency, apply a migration)
-- System-level architecture (use `arch-design`)
-- Recording a specific decision already made (use `arch-decide`)
-- Debugging an existing error (use `root-cause-trace` or `debug`)
-- Writing code whose shape is already clear to you
-
-## Workflow
-
-Three phases. Each converges a little more. Each is validated before moving to the next.
-
-### Phase 1 — Understand the Idea
-
-Dialogue, not interrogation. Before the first question, read the project context: the relevant directory, recent commits, any existing docs, the language and stack. Ground your questions in what's already there.
-
-**Rules:**
-
-- **One question per message.** Never batch. Respondents answer the easiest question and skip the rest when you batch.
-- **Prefer multiple choice.** Easier to answer, faster to skim, surfaces the option space for free.
-- **Open-ended when the option space is too large to enumerate.** Then refine with follow-ups.
-- **Focus the first questions on purpose, not mechanism.** Mechanism comes in phase 2.
-
-**Topics to cover (not a script — skip what's already clear):**
-
-- What problem does this solve?
-- Who is the user or caller?
-- What's the smallest version that would be useful?
-- What's explicitly out of scope?
-- What are the success criteria (measurable if possible)?
-- What constraints apply — team size, timeline, existing code, stack?
-
-**Stop when** you can state the idea back in one sentence and the user confirms. Don't keep asking for the sake of thoroughness.
-
-### Phase 2 — Explore Approaches
-
-Before committing to a direction, generate **six candidate approaches**. Force diversity.
-
-**Composition:**
-
-- **Three conventional approaches** — the paths a reasonable engineer with relevant experience would consider. Each has a high prior probability of being right (roughly ≥80%).
-- **Three tail samples** — approaches from genuinely different regions of the solution space. Each individually unlikely (roughly ≤10%), but together they expand what's been considered. These are the ones that surprise.
-
-Why tail samples matter: most teams converge on the first conventional option. The tail samples either reveal a better solution you hadn't considered, or they clarify *why* the conventional option is the right one. Either way, the recommendation that follows is stronger.
-
-**For each candidate, state:**
-
-- One-paragraph summary
-- Honest pros
-- Honest cons (no selling; if you can't name real cons, you haven't thought hard enough)
-
-**End with:**
-
-- Your recommendation
-- Why — explicit mapping to the constraints and success criteria from phase 1
-- What's being traded away
-- What becomes an open decision for `arch-decide` later
-
-### Phase 3 — Present the Design
-
-Once a direction is chosen, describe it in **200-300 word chunks**. After each chunk, ask "does this look right so far?" Never dump a wall of design prose — the user skims walls.
-
-**Typical sections, one chunk each:**
-
-1. **Architecture** — components, boundaries, key interfaces
-2. **Data flow** — what moves through where, in what shape
-3. **Persistence** — what's stored, where, for how long, with what durability
-4. **Error handling** — expected failures, responses, user-facing behavior
-5. **Testing approach** — what correctness means here, how it gets verified
-6. **Observability** — what gets logged, measured, traced
-
-Skip sections that don't apply. Add domain-specific ones (auth flow, concurrency model, migration plan, rollout strategy) where relevant.
-
-**Be willing to back up.** If a chunk surfaces a question that invalidates an earlier chunk, revise. Committing to a wrong direction to avoid rework is the expensive path.
-
-## Output
-
-Write the validated design to `docs/design/<topic>.md`. Use this skeleton (omit sections that don't apply):
-
-```markdown
-# Design: <topic>
-
-**Date:** <YYYY-MM-DD>
-**Status:** Draft | Accepted | Implemented
-
-## Problem
-
-One paragraph. What, who, why now.
-
-## Non-Goals
-
-Bullet list of what this explicitly does not address. Prevents scope creep.
-
-## Approaches Considered
-
-### Recommended: <name>
-
-Summary, pros, cons.
-
-### Rejected: <name>
-
-Why not. Brief.
-
-(Include 2-3 rejected options — showing alternatives were weighed is itself valuable.)
-
-## Design
-
-### Architecture
-
-### Data Flow
-
-### Persistence
-
-### Error Handling
-
-### Testing
-
-### Observability
-
-## Open Questions
-
-Items that need decisions before or during implementation.
-
-- [ ] Question — likely candidate for an ADR via `arch-decide`
-- [ ] Question — …
-
-## Next Steps
-
-- Open questions → `arch-decide`
-- If this implies system-level structural change → `arch-design`
-- Implementation → <agreed next action>
-```
-
-If `docs/design/` doesn't exist, create it. If a design with the same topic name exists, ask before overwriting.
-
-## Hand-Off
-
-After the design is written and agreed:
-
-- **Each open question** → run `arch-decide` to record as an ADR
-- **System-level implications** → run `arch-design` to update the architecture brief
-- **Implementation** → whatever the project's implementation path is (`start-work`, `add-tests`, or direct work)
-
-Link the design doc from wherever it's being implemented (PR description, ticket, commit).
-
-## Review Checklist
-
-Before declaring the design accepted:
-
-- [ ] Problem stated in one paragraph that a newcomer could understand
-- [ ] Non-goals listed (at least 1-2)
-- [ ] At least 6 approaches considered, with 3 genuinely in the tail
-- [ ] Recommendation names what it's trading away
-- [ ] Each design chunk was individually validated
-- [ ] Open questions listed; each has a clear path forward
-- [ ] User has confirmed the design reflects their intent
-
-## Anti-Patterns
-
-- **Asking three questions in one message.** The user answers the easiest. Ask one.
-- **Leading questions.** "Don't you think Postgres is right here?" isn't exploration.
-- **Skipping tail samples.** If all six options are minor variations on the conventional answer, diversity wasn't actually attempted.
-- **Dumping the whole design at once.** Eight hundred words without validation checkpoints means the user skims and misses the thing they'd want to push back on.
-- **Hiding trade-offs.** Every approach has real cons. State them or the evaluation is theatre.
-- **"We'll figure it out in implementation."** If a design question is ducked here, it becomes a larger, costlier problem during coding. Resolve now or explicitly defer with an ADR.
-- **Over-specifying.** The design should be detailed enough to implement, not so detailed it's already the implementation. If you're writing function bodies in the design, stop.
-
-## Key Principles
-
-- **One question at a time.** Non-negotiable.
-- **Multiple choice beats open-ended** when the option space is small enough to enumerate.
-- **Six approaches, three in the tail.** Discipline around diversity.
-- **Validate in chunks.** 200-300 words, then check.
-- **YAGNI ruthlessly.** Remove anything not justified by the problem statement.
-- **Back up when needed.** Revising early beats committing to wrong.
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/c4-analyze/SKILL.md b/c4-analyze/SKILL.md
deleted file mode 100644
index ab8986b..0000000
--- a/c4-analyze/SKILL.md
+++ /dev/null
@@ -1,212 +0,0 @@
----
-name: c4-analyze
-description: Analyze a codebase or git repo and generate C4 architecture diagrams (System Context, Container, Component). Use when the user wants to visualize or document the architecture of an existing project. Dispatched by arch-document for building-block and container views; usable standalone for quick architecture visualization. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
-argument-hint: "[path-or-container-name]"
----
-
-# /c4-analyze — Generate C4 Architecture Diagrams from Code
-
-Analyze the current codebase (or a specified path) and generate C4 model diagrams
-following Simon Brown's methodology.
-
-## Arguments
-
-- No argument: analyze the current working directory
-- A path (e.g., `./services/api`): analyze that subdirectory
-- A container name from a previous analysis (e.g., `api-server`): generate a Level 3 Component diagram for that container
-
-## Step 1: Reconnaissance
-
-Scan the repository to understand its structure. Look for these signals:
-
-### Project boundaries and workspaces
-- `pnpm-workspace.yaml`, `lerna.json`, `nx.json`, `turbo.json` → JS/TS monorepo
-- `go.work`, multiple `go.mod` files → Go workspace
-- `Cargo.toml` with `[workspace]` → Rust workspace
-- Multiple `pom.xml` or `settings.gradle` → Java multi-module
-- Multiple `pyproject.toml`, `setup.py`, or a `packages/` directory → Python monorepo
-
-### Containers (deployable units)
-- `Dockerfile`, `docker-compose.yml`, `docker-compose.yaml` → containerized services
-- Kubernetes manifests (`k8s/`, `*.yaml` with `kind: Deployment`) → k8s services
-- `serverless.yml`, `sam-template.yaml`, AWS CDK/Terraform with Lambda → serverless functions
-- Separate apps with their own entry points (`main.go`, `manage.py`, `index.ts`, `Program.cs`)
-- Database migration directories (`migrations/`, `alembic/`, `flyway/`)
-
-### Technology detection
-- Package managers: `package.json`, `requirements.txt`, `pyproject.toml`, `go.mod`, `Cargo.toml`, `pom.xml`, `Gemfile`, `mix.exs`
-- Frameworks: look at imports/dependencies (Django, Flask, FastAPI, Express, Next.js, Spring Boot, Rails, Phoenix, etc.)
-- Databases: connection strings in config, ORM models, migration tools
-- Message queues: Kafka, RabbitMQ, Redis, SQS client imports
-
-### External system integrations
-- HTTP client calls to external APIs (look for base URLs, API client classes)
-- SDK imports (AWS SDK, Stripe, Twilio, SendGrid, etc.)
-- OAuth/SAML/SSO integrations
-- Third-party service configuration (environment variables referencing external URLs)
-
-### Users/actors
-- Authentication providers → user types
-- API key/token auth → machine clients
-- Admin interfaces → admin users
-- Public vs internal endpoints → different user groups
-
-## Step 2: Build the Model
-
-Organize findings into the C4 hierarchy:
-
-```
-Software System
-├── People (users, roles, external actors)
-├── External Systems (third-party services, APIs)
-└── Containers (your deployable units)
-    ├── Name, technology, description
-    ├── Relationships to other containers
-    └── Components (for drill-down)
-        ├── Name, technology, description
-        └── Relationships to other components
-```
-
-For each element, capture:
-- **Name**: Clear, specific (not "business logic" or "backend")
-- **Technology**: The actual framework/language/platform
-- **Description**: One sentence about its responsibility
-- **Relationships**: What it talks to, how (protocol), and why (purpose)
-
-## Step 3: Generate Diagrams
-
-Generate diagrams as **draw.io XML** (`.drawio` files). Always generate Level 1 and Level 2.
-
-### draw.io XML format
-
-Use the `<mxfile><diagram><mxGraphModel>` structure with `<mxCell>` elements.
-
-### C4 color scheme
-
-| Element | Fill Color | Stroke Color | Font Color | Shape |
-|---------|-----------|-------------|-----------|-------|
-| Person | `#08427B` | `#073B6F` | `#ffffff` | `shape=mxgraph.c4.person2` |
-| System (in scope) | `#1168BD` | `#0B4884` | `#ffffff` | `rounded=1;arcSize=10` |
-| System (external) | `#999999` | `#8A8A8A` | `#ffffff` | `rounded=1;arcSize=10` |
-| Container | `#438DD5` | `#3C7FC0` | `#ffffff` | `rounded=1;arcSize=10` |
-| Container (database) | `#438DD5` | `#3C7FC0` | `#ffffff` | `shape=cylinder3` |
-| Component | `#85BBF0` | `#78A8D8` | `#000000` | `rounded=1;arcSize=10` |
-| Planned/future | `#CCCCCC` | `#AAAAAA` | `#ffffff` | `dashed=1` |
-| Boundary | none | `#0B4884` (system) / `#3C7FC0` (container) | match stroke | `swimlane;dashed=1;dashPattern=8 4` |
-
-### Element label format
-
-Every element label should include (using HTML in the `value` attribute):
-- **Bold name**
-- Element type in small text: `[Person]`, `[Software System]`, `[Container: Technology]`, `[Component: Technology]`
-- Description in normal text
-
-Example: `<b>API Server</b><br><font style="font-size:10px">[Container: Django 6.0, Python 3.12]</font><br><br><font style="font-size:11px">REST API handling mission CRUD and agent orchestration</font>`
-
-### Diagram structure rules
-
-- **Title**: Text cell at the top with `fontSize=18;fontStyle=1`
-- **Key/legend**: Small boxes in a corner showing the color scheme
-- **Relationships**: `edgeStyle=orthogonalEdgeStyle;rounded=1` with label showing purpose + protocol. Always use proper `source` and `target` attributes referencing element IDs — never use manual `sourcePoint`/`targetPoint` coordinates, which render as floating disconnected lines.
-- **Boundaries**: Use `swimlane` style with `container=1;collapsible=0` — child elements use `parent="boundaryId"`
-- **Layout**: People at top, system in scope in center, external systems around the edges or bottom. Most important element centered.
-- **Canvas size**: `pageWidth="1600" pageHeight="1200"` minimum; increase for complex diagrams
-
-### Spacing and layout rules (critical)
-
-These rules prevent overlapping text and unreadable diagrams:
-
-- **140px minimum** between context elements (people, external containers) and the boundary top, so edge labels don't overlap the boundary title text.
-- **200px minimum** between boundary bottom and external systems below, so edge labels between them are readable and don't overlap the boundary line.
-- **150px minimum gap** between sibling elements in the same row (containers, components), so relationship labels between them have room.
-- **Center contents inside boundaries**: calculate total content width, subtract from boundary width, divide by 2 for left margin. Do this for each row independently.
-- **Use explicit waypoints** (`<Array as="points">`) to route edges around obstacles. Waypoints should run through the gap between the boundary edge and external systems.
-
-### Edge routing rules
-
-- **Never route an edge through a component or container.** If an edge would cross an element, spread elements apart to create a clear lane, or reroute the edge with waypoints.
-- **Edges that skip rows must route around, not through.** If a service in row 2 connects to an external system below the boundary, and row 3 (data stores) is in between, the edge must route along the left or right margin of the boundary and exit from the bottom — never vertically through row 3. Use waypoints to hug the boundary edge.
-- **Cross-row edges** (e.g., connecting elements that aren't adjacent) must route above or below the intervening row using waypoints, not through it.
-- **Edges exiting a boundary** to reach external systems should use waypoints at intermediate Y positions between the boundary bottom and the external system top, with each edge at a different Y to avoid overlapping.
-- **All edges must use proper `source` and `target` attributes** referencing element IDs. Never use `sourcePoint`/`targetPoint` manual coordinates — these create disconnected lines and make the audit untraceable.
-
-### Level 1: System Context
-
-Show the software system as a single box, surrounded by users and external systems.
-
-### Level 2: Container
-
-Zoom into the system boundary (drawn as a dashed swimlane) to show containers with technology choices. Include people and external systems outside the boundary for context.
-
-### Level 3: Component (on request or when drilling into a container)
-
-Zoom into a single container (drawn as a dashed swimlane) to show its components. Include surrounding containers and external systems for context.
-
-## Step 4: Layout Audit (mandatory — do this before saving)
-
-After generating the initial diagram XML, perform this audit. If any check fails, adjust the layout and re-check. Repeat until all checks pass.
-
-### 4a. Trace every edge path
-
-For each edge, compute the full path by listing the absolute coordinates of every point: source exit point → each waypoint → target entry point. Each pair of consecutive points forms a segment. For each segment, check if the line intersects the bounding box (x, y, width, height) of every element in the diagram. If any segment crosses through any element:
-- Route the edge around the element using waypoints along the boundary margin, OR
-- Spread elements apart to create a clear lane
-
-Special attention: edges connecting to external systems below the boundary often have a long vertical segment dropping through lower rows. These MUST route along the left or right margin first, then drop below the boundary, then turn horizontally to reach the target.
-
-### 4b. Check edge label positions for overlaps
-
-draw.io places edge labels at the midpoint of the longest segment of an orthogonal edge. For each edge label:
-1. Identify the longest segment of the edge path (the segment with the greatest length).
-2. The label will be centered on this segment's midpoint.
-3. Estimate the label's bounding box conservatively: use **12px per character width** and **20px per line height**, centered on the midpoint. Over-estimating is better than under-estimating — a little extra spacing is preferable to overlapping text.
-4. Check if this bounding box overlaps with:
-   - Any element (container, component, person, external system)
-   - The boundary title text (the swimlane header area)
-   - Any other edge label's estimated bounding box
-
-**Critical:** If the longest segment is a vertical drop adjacent to a container (e.g., the last segment before the target), the label will render on top of that neighboring container. To fix this, add waypoints so that the longest segment is in open space — typically a horizontal run through a clear gap between rows.
-
-If an overlap is found:
-- Add waypoints to change which segment is longest, moving where the label lands, OR
-- Shift the edge's exit/entry points to move the longest segment to a clear area, OR
-- Increase spacing between the elements the edge connects
-
-### 4c. Check parallel and bundled edges
-
-**No two edges may share the same waypoint Y or X value.** When edges share a coordinate, draw.io renders them on top of each other, making it impossible to trace which line goes where. For every pair of edges:
-- Each edge must use a **unique waypoint Y** (for horizontal runs) or **unique waypoint X** (for vertical runs), separated by at least **30px**.
-- Source exit points from the same element must be spread across different positions (e.g., exitX=0.2 vs exitX=0.5 vs exitX=0.8). Never use the same exit point for two edges.
-- Target entry points on the same element must also be spread (e.g., entryX=0.3 vs entryX=0.7).
-- If two edges run parallel for any segment, offset one of them by at least 30px using additional waypoints.
-
-Labels on parallel edges will also overlap if the edges are close. Since labels are centered on the longest segment, ensure the longest segments of nearby edges are in **different spatial areas** — either at different Y levels or with enough horizontal separation (at least 200px between label centers).
-
-### 4d. Verify boundary title clearance
-
-Check that no edge or edge label passes through or overlaps the boundary's swimlane header (the `startSize` area, typically 30px tall at the top of the boundary). If edges enter the boundary from above, they should connect at the boundary top edge, not route through the title.
-
-## Step 5: Output
-
-1. **Save the draw.io XML** to the project root:
-   - Filenames: `system-context.drawio`, `containers.drawio`, `components-[container-name].drawio`
-
-2. **Export to PNG** by running `drawio --export --format png --output <name>.png <name>.drawio` via Bash.
-
-3. **Open in draw.io desktop** by running `drawio <filepath>` via Bash.
-
-4. **Print a summary** of what was discovered:
-   - Number of containers, components, external systems, and user types identified
-   - Key technology choices detected
-   - Any ambiguities or areas where manual refinement is recommended
-
-## Guidelines
-
-- **Be specific, not generic.** "FastAPI REST server handling order management" beats "API server."
-- **Include technology choices.** The actual framework, database, protocol — not just "web app" or "database."
-- **Annotate every relationship** with its purpose and protocol/mechanism.
-- **Don't mix abstraction levels.** A container diagram shows containers, not classes.
-- **For monorepos:** Start with the top-level scan to show all services as containers. Offer to drill into specific services for components.
-- **When uncertain:** Note the ambiguity in the summary rather than guessing. Flag it for the user to clarify.
-- **Omit infrastructure cross-cutting concerns** (logging, monitoring) from component diagrams unless they are architecturally significant.
-- **Relationships should use prepositions** that match arrow direction: "reads from," "sends to," "makes API calls to."
diff --git a/c4-diagram/SKILL.md b/c4-diagram/SKILL.md
deleted file mode 100644
index 15948e4..0000000
--- a/c4-diagram/SKILL.md
+++ /dev/null
@@ -1,199 +0,0 @@
----
-name: c4-diagram
-description: Generate C4 architecture diagrams from a textual description of a software system. Use when the user describes a system they want to diagram, or wants to create architecture diagrams for a planned/proposed system. Dispatched by arch-document for context and container views; usable standalone when the system is described in prose rather than existing code. Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
-argument-hint: "[description or diagram level]"
----
-
-# /c4-diagram — Generate C4 Architecture Diagrams from Description
-
-Generate C4 model architecture diagrams based on a conversational description of a
-software system, following Simon Brown's methodology.
-
-## Arguments
-
-- A description of the system (e.g., `/c4-diagram e-commerce platform with React frontend and Python backend`)
-- A diagram level to generate (e.g., `/c4-diagram component api-server` — if a model has already been established in conversation)
-- No argument: ask the user to describe their system
-
-## Flow
-
-### Phase 1: Understand the System
-
-If the user provides a brief description, ask clarifying questions to fill gaps.
-Ask only what's needed — don't interrogate. Group questions logically.
-
-**Essential questions (ask if not provided):**
-
-1. **What does the system do?** (one-sentence purpose)
-2. **Who uses it?** (user types/roles — end users, admins, other systems, scheduled jobs)
-3. **What are the major pieces?** (web app, mobile app, API, database, workers, etc.)
-
-**Follow-up questions (ask if relevant, based on what was shared):**
-
-4. **What external systems does it integrate with?** (payment providers, email services, auth providers, third-party APIs)
-5. **What are the key technology choices?** (languages, frameworks, databases, message queues)
-6. **How do the pieces communicate?** (REST, GraphQL, gRPC, message queues, webhooks)
-
-Don't ask all questions at once. Start with 1-3, then follow up based on the answers.
-
-### Phase 2: Build the Model
-
-Organize the information into the C4 hierarchy:
-
-- **People**: Users, roles, external actors
-- **Software System**: The system being described (the thing in scope)
-- **External Systems**: Third-party services, legacy systems, partner APIs
-- **Containers**: The major deployable/runnable pieces
-- **Components**: Internal groupings within containers (only if user wants Level 3)
-
-For each element, define:
-- **Name**: Clear and specific
-- **Technology**: If known or decided; note "TBD" if still under consideration
-- **Description**: One sentence about its responsibility
-- **Relationships**: What connects to what, the purpose, and the mechanism
-
-### Phase 3: Generate Diagrams
-
-Generate diagrams as **draw.io XML** (`.drawio` files).
-
-Default behavior: generate Level 1 (System Context) and Level 2 (Container).
-Generate Level 3 (Component) only when the user asks to drill into a specific container.
-
-#### draw.io XML format
-
-Use the `<mxfile><diagram><mxGraphModel>` structure with `<mxCell>` elements.
-
-#### C4 color scheme
-
-| Element | Fill Color | Stroke Color | Font Color | Shape |
-|---------|-----------|-------------|-----------|-------|
-| Person | `#08427B` | `#073B6F` | `#ffffff` | `shape=mxgraph.c4.person2` |
-| System (in scope) | `#1168BD` | `#0B4884` | `#ffffff` | `rounded=1;arcSize=10` |
-| System (external) | `#999999` | `#8A8A8A` | `#ffffff` | `rounded=1;arcSize=10` |
-| Container | `#438DD5` | `#3C7FC0` | `#ffffff` | `rounded=1;arcSize=10` |
-| Container (database) | `#438DD5` | `#3C7FC0` | `#ffffff` | `shape=cylinder3` |
-| Component | `#85BBF0` | `#78A8D8` | `#000000` | `rounded=1;arcSize=10` |
-| Planned/future | `#CCCCCC` | `#AAAAAA` | `#ffffff` | `dashed=1` |
-| Boundary | none | `#0B4884` (system) / `#3C7FC0` (container) | match stroke | `swimlane;dashed=1;dashPattern=8 4` |
-
-#### Element label format
-
-Every element label should include (using HTML in the `value` attribute):
-- **Bold name**
-- Element type in small text: `[Person]`, `[Software System]`, `[Container: Technology]`, `[Component: Technology]`
-- Description in normal text
-
-Example: `<b>API Server</b><br><font style="font-size:10px">[Container: Django 6.0, Python 3.12]</font><br><br><font style="font-size:11px">REST API handling mission CRUD and agent orchestration</font>`
-
-#### Diagram structure rules
-
-- **Title**: Text cell at top with `fontSize=18;fontStyle=1`
-- **Key/legend**: Small boxes in a corner showing the color scheme
-- **Relationships**: `edgeStyle=orthogonalEdgeStyle;rounded=1` with label showing purpose + protocol. Always use proper `source` and `target` attributes referencing element IDs — never use manual `sourcePoint`/`targetPoint` coordinates, which render as floating disconnected lines.
-- **Boundaries**: Use `swimlane` style with `container=1;collapsible=0` — child elements use `parent="boundaryId"`
-- **Layout**: People at top, system in scope in center, external systems around edges/bottom
-- **Canvas size**: `pageWidth="1600" pageHeight="1200"` minimum; increase for complex diagrams
-
-#### Spacing and layout rules (critical)
-
-These rules prevent overlapping text and unreadable diagrams:
-
-- **140px minimum** between context elements (people, external containers) and the boundary top, so edge labels don't overlap the boundary title text.
-- **200px minimum** between boundary bottom and external systems below, so edge labels between them are readable and don't overlap the boundary line.
-- **150px minimum gap** between sibling elements in the same row (containers, components), so relationship labels between them have room.
-- **Center contents inside boundaries**: calculate total content width, subtract from boundary width, divide by 2 for left margin. Do this for each row independently.
-- **Use explicit waypoints** (`<Array as="points">`) to route edges around obstacles. Waypoints should run through the gap between the boundary edge and external systems.
-
-#### Edge routing rules
-
-- **Never route an edge through a component or container.** If an edge would cross an element, spread elements apart to create a clear lane, or reroute the edge with waypoints.
-- **Edges that skip rows must route around, not through.** If a service in row 2 connects to an external system below the boundary, and row 3 (data stores) is in between, the edge must route along the left or right margin of the boundary and exit from the bottom — never vertically through row 3. Use waypoints to hug the boundary edge.
-- **Cross-row edges** (e.g., connecting elements that aren't adjacent) must route above or below the intervening row using waypoints, not through it.
-- **Edges exiting a boundary** to reach external systems should use waypoints at intermediate Y positions between the boundary bottom and the external system top, with each edge at a different Y to avoid overlapping.
-- **All edges must use proper `source` and `target` attributes** referencing element IDs. Never use `sourcePoint`/`targetPoint` manual coordinates — these create disconnected lines and make the audit untraceable.
-
-#### Level 1: System Context
-
-Show the software system as a single box, surrounded by users and external systems.
-
-#### Level 2: Container
-
-Zoom into the system boundary (dashed swimlane) to show containers with technology choices. Include people and external systems outside the boundary for context.
-
-#### Level 3: Component
-
-Zoom into a single container (dashed swimlane) to show its components. Include surrounding containers and external systems for context.
-
-#### Deployment Diagram (if the user asks about infrastructure)
-
-Show deployment nodes (nested boxes) containing containers, mapped to infrastructure.
-
-### Phase 4: Layout Audit (mandatory — do this before saving)
-
-After generating the initial diagram XML, perform this audit. If any check fails, adjust the layout and re-check. Repeat until all checks pass.
-
-#### 4a. Trace every edge path
-
-For each edge, compute the full path by listing the absolute coordinates of every point: source exit point → each waypoint → target entry point. Each pair of consecutive points forms a segment. For each segment, check if the line intersects the bounding box (x, y, width, height) of every element in the diagram. If any segment crosses through any element:
-- Route the edge around the element using waypoints along the boundary margin, OR
-- Spread elements apart to create a clear lane
-
-Special attention: edges connecting to external systems below the boundary often have a long vertical segment dropping through lower rows. These MUST route along the left or right margin first, then drop below the boundary, then turn horizontally to reach the target.
-
-#### 4b. Check edge label positions for overlaps
-
-draw.io places edge labels at the midpoint of the longest segment of an orthogonal edge. For each edge label:
-1. Identify the longest segment of the edge path (the segment with the greatest length).
-2. The label will be centered on this segment's midpoint.
-3. Estimate the label's bounding box conservatively: use **12px per character width** and **20px per line height**, centered on the midpoint. Over-estimating is better than under-estimating — a little extra spacing is preferable to overlapping text.
-4. Check if this bounding box overlaps with:
-   - Any element (container, component, person, external system)
-   - The boundary title text (the swimlane header area)
-   - Any other edge label's estimated bounding box
-
-**Critical:** If the longest segment is a vertical drop adjacent to a container (e.g., the last segment before the target), the label will render on top of that neighboring container. To fix this, add waypoints so that the longest segment is in open space — typically a horizontal run through a clear gap between rows.
-
-If an overlap is found:
-- Add waypoints to change which segment is longest, moving where the label lands, OR
-- Shift the edge's exit/entry points to move the longest segment to a clear area, OR
-- Increase spacing between the elements the edge connects
-
-#### 4c. Check parallel and bundled edges
-
-**No two edges may share the same waypoint Y or X value.** When edges share a coordinate, draw.io renders them on top of each other, making it impossible to trace which line goes where. For every pair of edges:
-- Each edge must use a **unique waypoint Y** (for horizontal runs) or **unique waypoint X** (for vertical runs), separated by at least **30px**.
-- Source exit points from the same element must be spread across different positions (e.g., exitX=0.2 vs exitX=0.5 vs exitX=0.8). Never use the same exit point for two edges.
-- Target entry points on the same element must also be spread (e.g., entryX=0.3 vs entryX=0.7).
-- If two edges run parallel for any segment, offset one of them by at least 30px using additional waypoints.
-
-Labels on parallel edges will also overlap if the edges are close. Since labels are centered on the longest segment, ensure the longest segments of nearby edges are in **different spatial areas** — either at different Y levels or with enough horizontal separation (at least 200px between label centers).
-
-#### 4d. Verify boundary title clearance
-
-Check that no edge or edge label passes through or overlaps the boundary's swimlane header (the `startSize` area, typically 30px tall at the top of the boundary). If edges enter the boundary from above, they should connect at the boundary top edge, not route through the title.
-
-### Phase 5: Output
-
-1. **Save the draw.io XML** to the project root (or current directory):
-   - Filenames: `system-context.drawio`, `containers.drawio`, `components-[name].drawio`
-
-2. **Export to PNG** by running `drawio --export --format png --output <name>.png <name>.drawio` via Bash.
-
-3. **Open in draw.io desktop** by running `drawio <filepath>` via Bash.
-
-4. **Offer next steps:**
-   - "Want me to zoom into any container for a component diagram?"
-   - "Want to add a deployment diagram?"
-   - "Anything to adjust?"
-
-## Guidelines
-
-- **Be conversational, not bureaucratic.** Don't force the user through a rigid questionnaire. Adapt based on what they share.
-- **Infer where reasonable.** If the user says "React frontend with a Node API and Postgres," you don't need to ask what the database technology is.
-- **Be specific in element descriptions.** "Provides product search, cart management, and checkout" beats "Handles business logic."
-- **Include technology choices** even during upfront design. If the user hasn't decided, note options: "Database [PostgreSQL or MySQL — TBD]".
-- **Annotate every relationship** with purpose and protocol/mechanism.
-- **Don't mix abstraction levels.** Keep each diagram at one level of the C4 hierarchy.
-- **Use prepositions in relationship labels** that match arrow direction: "reads from," "sends to," "makes API calls to."
-- **For large systems:** Start with System Context, then Container. Offer Component diagrams for the most interesting containers rather than trying to diagram everything.
-- **Iterate.** After generating a diagram, ask if it looks right. The user often has corrections or additions after seeing the first version.
diff --git a/codify/SKILL.md b/codify/SKILL.md
deleted file mode 100644
index 4a0b5e6..0000000
--- a/codify/SKILL.md
+++ /dev/null
@@ -1,196 +0,0 @@
----
-name: codify
-description: Codify concrete, actionable insights from recent session work into the project's `CLAUDE.md` so they survive across sessions and compound over time. Harvests patterns that worked, anti-patterns that bit, API gotchas, specific thresholds, and verification checks. Filters against quality gates (atomic, evidence-backed, non-redundant, verifiable, safe, stable). Writes into a dedicated `## Codified Insights` section rather than scattering entries. Use after a productive session, a bug fix that revealed a non-obvious pattern, or an explicit review where you want learnings preserved as rules. Supports `--dry-run` to preview, `--max=N` to cap output, `--target=<path>` to write elsewhere, `--section=<name>` to override the destination section. Flags insights that look cross-project and suggests promotion to `~/code/rulesets/claude-rules/` instead. Do NOT use for session wrap-up / progress summaries (not insights), for private personal context (auto-memory handles that, not a tracked file), or for formal rules that belong in `.claude/rules/`. Informed by Agentic Context Engineering (ACE, arXiv:2510.04618) — grow-and-refine without context collapse.
----
-
-# Codify
-
-Turn transient session insights into durable, actionable entries in the project's `CLAUDE.md`. Each invocation should make the next session measurably better without diluting what's already there. Select carefully, phrase specifically, commit deliberately.
-
-## When to Use
-
-- End of a productive session where you want concrete patterns preserved
-- After a bug fix that revealed a non-obvious constraint or gotcha
-- After a review where a pattern was identified as worth repeating (or avoiding)
-- When you notice yourself re-deriving the same insight — it belongs written down
-
-## When NOT to Use
-
-- For a session wrap-up summary (that's narrative, not an insight)
-- For personal/private context (auto-memory at `~/.claude/projects/<cwd>/memory/` captures that — see below)
-- For formal project rules — those belong in `.claude/rules/*.md`
-- When you have no specific, evidence-backed insight yet — skip rather than fabricate
-
-## Relationship to Other Memory Systems
-
-Three distinct systems, zero overlap:
-
-| System | Location | Scope | Purpose |
-|---|---|---|---|
-| **auto-memory** | `~/.claude/projects/<cwd>/memory/` | Private, per-working-directory | Session-bridging context about the user and project (feedback, user traits, project state). Written continuously by the agent. |
-| **`/codify` (this skill)** | Project `CLAUDE.md` | Public, tracked, per-project | Explicit, curated rules and patterns. Written deliberately by the user invocation. |
-| **Formal rules** | `.claude/rules/*.md`, `~/code/rulesets/claude-rules/` | Public, tracked, per-project or global | Stable policy (style, conventions, verification). Authored once, rarely updated. |
-
-Use the right system for the right content.
-
-## Workflow
-
-Four phases. Each can be skipped if it has no content; none should be silently merged.
-
-### Phase 1 — Harvest
-
-Identify candidate insights from recent work. Look at:
-
-- The session transcript (or files referenced by `--source`)
-- Recent commits and their messages
-- Any `.architecture/evaluation-*.md` from `arch-evaluate`
-- Reflection or critique outputs if they exist
-- Anti-patterns you caught yourself falling into
-
-**Extract only:**
-
-- **Patterns that worked** — preferably with a minimum precondition and a worked example or reference
-- **Anti-patterns that bit** — with the observable symptom and the reason
-- **API / tool gotchas** — auth quirks, rate limits, idempotency, error codes
-- **Verification items** — concrete checks that would catch regressions next time
-- **Specific thresholds** — "pagination above 50 items" not "pagination when needed"
-
-**Exclude:**
-
-- Progress narrative ("today we shipped X")
-- Personal preferences ("I like functional style")
-- Vague aphorisms ("write good code")
-- Unverified claims (if you can't cite code, docs, or repeated observation, skip)
-
-### Phase 2 — Filter (Grow-and-Refine)
-
-For each candidate insight, apply these gates. Fail any → drop the entry.
-
-- **Actionable.** A reader could apply this immediately. "Write good code" fails; "For dataset lookups under ~100 items, Object outperforms Map in V8" passes.
-- **Specific.** Names a threshold, a file, a flow, a version, or a named tool. Generic insights are noise.
-- **Evidence-backed.** Derived from code you just read, docs you just verified, or a pattern observed more than once. Speculation doesn't count.
-- **Atomic.** One idea per bullet. If the insight has two distinct parts, it's two bullets.
-- **Non-redundant.** Check existing `CLAUDE.md` content. If something similar exists, prefer merging or skipping over duplicating. If the new one is genuinely more specific and evidence-backed than the existing one, append it and mark the older one with `(candidate for consolidation)` — don't auto-delete prior user content.
-- **Safe.** No secrets, tokens, private URLs, or PII. Nothing that would leak in a public commit.
-- **Stable.** Prefer patterns that'll remain valid. If version-specific, say so.
-
-### Phase 3 — Write
-
-Write approved insights to a dedicated section of `CLAUDE.md`. Default section name: **`## Codified Insights`**. Override with `--section=<name>`.
-
-**Discipline:**
-
-- **One section only.** Don't scatter entries across CLAUDE.md. All codified content in one place means future `/codify` runs and human readers find it fast.
-- **Create the section if absent.** Place it near the end of CLAUDE.md, before any footer links.
-- **Preserve chronology within the section.** Newer entries appended; don't shuffle.
-- **Include provenance.** Each entry gets a date and, where useful, a one-word source hint (`pattern:`, `gotcha:`, `threshold:`, `anti-pattern:`, `verify:`).
-
-**Entry format:**
-
-```markdown
-- **<short title or rule>.** <One or two sentences. Concrete. Actionable.> (`<source-hint>` — YYYY-MM-DD)
-```
-
-Examples:
-
-```markdown
-- **Pagination threshold.** Fetch endpoints returning >50 items must paginate; clients assume everything ≤50 is complete. (`threshold` — 2026-04-19)
-- **Map vs Object for small lookups.** In V8, Object outperforms Map for <~100 keys; Map wins at 10k+. Use Object for hot config lookups. (`pattern` — 2026-04-19)
-- **Never log `load-file-name` from batch-compile context.** Both `load-file-name` and `buffer-file-name` are nil during top-level evaluation; `file-name-directory nil` raises `stringp, nil`. (`gotcha` — 2026-04-19)
-```
-
-### Phase 4 — Validate
-
-After writing, check:
-
-- [ ] Every entry passed all Phase 2 gates
-- [ ] Each entry is atomic (one idea)
-- [ ] No near-duplicates were created
-- [ ] The `## Codified Insights` section is coherent — entries flow, categories aren't interleaved randomly
-
-If the validation surfaces a problem, fix before exiting.
-
-## Cross-Project Promotion
-
-Some insights apply to *all* your projects, not just this one. Examples:
-
-- "Always emit JSON with a stable key order for git diffs"
-- "For TypeScript libraries, expose types via `package.json#exports`"
-
-When an insight reads as general rather than project-specific, the skill emits a **promotion hint** at the end of the run:
-
-```
-Promotion candidates:
-- "JSON stable key order" — reads as general. Consider adding to:
-    ~/code/rulesets/claude-rules/style.md
-  (would apply to every project via global install)
-
-Keep as project-specific, promote, or drop? [k/p/d]
-```
-
-Promotion happens manually — the skill doesn't edit the rulesets repo automatically. The hint is a nudge to think about scope.
-
-## Arguments
-
-- **`--dry-run`** — show the proposed entries and where they'd be written; do not modify any files.
-- **`--max=N`** — cap output to the top N insights by specificity + evidence.
-- **`--target=<path>`** — write to a different file. Defaults to `./CLAUDE.md`. Use e.g. `docs/learnings.md` if the project prefers a separate file.
-- **`--section=<name>`** — override the default `## Codified Insights` section name.
-- **`--source=<spec>`** — scope what gets harvested. Values: `last` (most recent message), `selection` (a user-highlighted region if supported), `chat:<id>` (a specific past conversation), `commits:<range>` (e.g., `commits:HEAD~10..`). Defaults to a reasonable window of recent session context.
-
-## Output
-
-On a real run (not `--dry-run`):
-
-1. Short summary — "added N entries to `<target>`: X patterns, Y gotchas, Z thresholds."
-2. Any promotion candidates flagged for global-rules consideration.
-3. Confirmation of the file path modified.
-
-On `--dry-run`:
-
-1. Preview of each proposed entry with the section it would land in.
-2. Flagged promotions.
-3. Explicit confirmation nothing was written.
-
-## Anti-Patterns
-
-- **Summarizing the session instead of extracting insights.** "Today we refactored X" is narrative. "X's public API requires parameters in Y order due to Z" is an insight.
-- **Writing entries without evidence.** If you can't point to code, docs, or multiple observations, the entry is speculation.
-- **Overwriting prior content.** Mark conflicts for consolidation; don't auto-delete what the user wrote.
-- **Scattering entries.** One section. Grep-able. Coherent.
-- **Batch-writing 20 entries in one session.** If the session generated 20 real insights, many of them aren't. Filter harder. 3-5 genuine entries per run is typical.
-- **Adding to `CLAUDE.md` when auto-memory is the right system.** Private user-context goes in auto-memory; public project rules go here; static policy goes in `.claude/rules/`.
-- **Promoting too eagerly.** "This applies to all projects" is a strong claim. If you can't name three unrelated projects where the rule would fire, it's project-specific.
-
-## Review Checklist
-
-Before accepting the additions:
-
-- [ ] Each entry has a source hint and a date
-- [ ] Each entry passes the atomic / specific / evidence-backed / non-redundant / safe / stable gates
-- [ ] The `## Codified Insights` section exists exactly once in the target
-- [ ] Promotion candidates (if any) have been flagged, not silently promoted
-- [ ] `--dry-run` was used at least once if the target file is under active template management (e.g. bundled CLAUDE.md from rulesets install)
-
-## Maintenance
-
-`CLAUDE.md` accumulates over time. Periodically:
-
-- **Consolidate.** Entries marked `(candidate for consolidation)` deserve a look. Often the older entry is superseded; sometimes it's a special case of the newer one.
-- **Retire.** Entries about deprecated tools or obsolete versions should be removed explicitly (with a commit message noting the retirement).
-- **Promote.** Re-scan for entries that have started firing across multiple projects — those belong in `~/code/rulesets/claude-rules/`.
-- **Trim.** If the section grows past ~40 entries, either the project is complex enough to warrant splitting `CLAUDE.md` into multiple files, or the entries haven't been curated aggressively enough.
-
-## Theoretical Background
-
-The grow-and-refine / evidence-backed approach draws on Agentic Context Engineering (Zhang et al., *Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models*, arXiv:2510.04618). Key ideas borrowed:
-
-- **Generation → Reflection → Curation** as distinct phases, not a single compression step
-- **Grow-and-refine** — accumulate granular knowledge rather than over-compressing to vague summaries
-- **Avoiding context collapse** — resist the temptation to rewrite old entries into smoother prose; specificity is the value
-
-This skill implements the Curation phase. Reflection and Generation happen in the main conversation.
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/create-v2mom/SKILL.md b/create-v2mom/SKILL.md
deleted file mode 100644
index 9d2beef..0000000
--- a/create-v2mom/SKILL.md
+++ /dev/null
@@ -1,426 +0,0 @@
----
-name: create-v2mom
-description: Create a V2MOM (Vision, Values, Methods, Obstacles, Metrics) strategic framework for any project, goal, or domain — personal infrastructure, business strategy, health goals, financial planning, software development, career planning, or life goals. Walks the user through the five sections in order: Vision first (aspirational picture of success), then Values (2-4 principles that guide decisions), Methods (4-7 prioritized approaches with concrete actions), Obstacles (honest personal and external challenges), and Metrics (measurable outcomes, not vanity metrics). Includes an optional task-migration phase that consolidates an existing todo list under the defined Methods. Use when the user asks to create a V2MOM, build a strategic plan, set goals for a significant project, or apply ruthless prioritization without an existing framework. Do NOT use for simple todo lists, single-decision prompts (use /arch-decide), quick task brainstorming (use /brainstorm), or daily/weekly planning of routine tasks. Produces a document that becomes the project's decision-making source of truth.
----
-
-# /create-v2mom — Create a V2MOM Strategic Framework
-
-Transform vague intentions into a concrete action plan using V2MOM: Vision, Values, Methods, Obstacles, Metrics. Originated at Salesforce; works for any domain.
-
-## Problem Solved
-
-Without a strategic framework, projects suffer from:
-
-- **Unclear direction** — "get healthier" / "improve my finances" is too vague to act on; every idea feels equally important; no principled way to say "no."
-- **Priority inflation** — everything feels urgent; research/planning without execution; active todo list grows beyond manageability.
-- **No decision framework** — debates about A vs B waste time; second-guessing after decisions; perfectionism masquerading as thoroughness.
-- **Unmeasurable progress** — can't tell if work is actually making things better; no objective "done" signal; vanity metrics only.
-
-## When to Use
-
-- Starting a significant project (new business, new habit, new system)
-- Existing project has accumulated many competing priorities without clear focus
-- You find yourself constantly context-switching between ideas
-- Someone asks "what are you trying to accomplish?" and the answer is vague
-- Annual or quarterly planning for ongoing projects or life goals
-
-Particularly valuable for: personal infrastructure (tooling, systems, workflows), health and fitness, financial planning, software package development, business strategy, career development.
-
-## Exit Criteria
-
-The V2MOM is complete when:
-
-1. **All 5 sections have concrete content:**
-   - Vision: Clear, aspirational picture of success
-   - Values: 2-4 principles that guide decisions
-   - Methods: 4-7 concrete approaches with specific actions
-   - Obstacles: Honest personal/technical challenges
-   - Metrics: Measurable outcomes (not vanity metrics)
-2. **It's useful for decision-making:** can answer "does X fit this V2MOM?" quickly; provides priority clarity (Method 1 > Method 2 > etc.); identifies what NOT to do.
-3. **Both parties agree it's ready:** feels complete not rushed; actionable enough to start execution; honest about obstacles (not sugar-coated).
-
-**Validation questions:**
-- Can you articulate the vision in one sentence?
-- Do the values help you say "no" to things?
-- Are methods ordered by priority?
-- Can you immediately identify 3-5 tasks from Method 1?
-- Do metrics tell you if you're succeeding?
-
-## Instructions
-
-Complete phases in order. Vision informs Values. Values inform Methods. Methods reveal Obstacles. Everything together defines Metrics.
-
-### Phase 1: Ensure shared understanding of the framework
-
-Confirm both parties understand what each section means:
-
-- **Vision:** What you want to achieve (aspirational, clear picture of success)
-- **Values:** Principles that guide decisions (2-4 values, defined concretely)
-- **Methods:** How you'll achieve the vision (4-7 approaches, ordered by priority)
-- **Obstacles:** What's in your way (honest, personal, specific)
-- **Metrics:** How you'll measure success (objective, not vanity metrics)
-
-### Phase 2: Create the document structure
-
-1. Create file: `docs/[project-name]-v2mom.org` or appropriate location.
-2. Add metadata: `#+TITLE`, `#+AUTHOR`, `#+DATE`, `#+FILETAGS`.
-3. Create section headings for all 5 components.
-4. Add a "What is V2MOM?" overview section at top.
-
-Save incrementally after each section — V2MOM discussions can run long.
-
-### Phase 3: Define the Vision
-
-**Ask:** "What do you want to achieve? What does success look like?"
-
-**Goal:** Clear, aspirational picture. 1-3 paragraphs describing the end state.
-
-**Your role:**
-- Help articulate what's described
-- Push for specificity ("works great" → what specifically works?)
-- Identify scope (what's included, what's explicitly out)
-- Capture concrete examples the user mentions
-
-**Good vision characteristics:**
-- Paints a picture you can visualize
-- Describes outcomes, not implementation
-- Aspirational but grounded in reality
-- Specific enough to know what's included
-
-**Examples across domains:**
-- Health: "Wake up with energy, complete a 5K without stopping, feel strong in daily activities, stable mood throughout the day."
-- Finance: "Six months emergency fund, debt-free except mortgage, automatic retirement savings, financial decisions that don't cause anxiety."
-- Software: "A package that integrates seamlessly, has comprehensive documentation, handles edge cases gracefully, that maintainers of other packages want to depend on."
-
-**Time estimate:** 15-30 minutes if mostly clear; 45-60 minutes if it needs exploration.
-
-### Phase 4: Define the Values
-
-**Ask:** "What principles guide your decisions? When faced with A vs B, what values help you decide?"
-
-**Goal:** 2-4 values with concrete definitions and examples.
-
-**Your role:**
-- Suggest values based on vision discussion
-- Push for concrete definitions (not just the word, but what it MEANS)
-- Help distinguish between overlapping values
-- Identify when examples contradict stated values
-
-**Common pitfall:** Listing generic words without defining them.
-- Bad: "Quality, Speed, Innovation"
-- Good: "Sustainable means can maintain this for 10+ years without burning out. No crash diets, no 80-hour weeks, no technical debt I can't service."
-
-**For each value, capture:**
-1. The value name (1-2 words)
-2. Definition (what it means in context of this project)
-3. Concrete examples (how it manifests)
-4. What breaks this value (anti-patterns)
-
-**Method:** Start with 3-5 candidates. For each, ask "what does [value] mean to you in this context?" Discuss until the definition is concrete. Refine, merge, remove until 2-4 remain.
-
-**Examples:**
-- Health: "Sustainable: Can do this at 80 years old. No extreme diets. Focus on habits that compound over decades."
-- Finance: "Automatic: Set up once, runs forever. Don't rely on willpower for recurring decisions."
-- Software: "Boring: Use proven patterns. No clever code. Maintainable by intermediate developers. Boring is reliable."
-
-**Time estimate:** 30-45 minutes.
-
-### Phase 5: Define the Methods
-
-**Ask:** "How will you achieve the vision? What approaches will you take?"
-
-**Goal:** 4-7 methods (concrete approaches) ordered by priority.
-
-**Your role:**
-- Extract methods from vision and values discussion
-- Help order by priority (what must happen first?)
-- Ensure methods are actionable (not just categories)
-- Push for concrete actions under each method
-- Watch for method ordering that creates dependencies
-
-**Structure for each method:**
-1. Method name (verb phrase: "Build X", "Eliminate Y", "Establish Z")
-2. Aspirational description (1-2 sentences: why this matters)
-
-**Method ordering matters:**
-- Method 1 should be highest priority (blocking everything else)
-- Lower-numbered methods should enable higher-numbered ones
-- Common ordering patterns:
-  - Fix → Stabilize → Build → Enhance → Sustain
-  - Eliminate → Replace → Optimize → Automate → Maintain
-  - Learn → Practice → Apply → Teach → Systematize
-
-**Examples:**
-
-Health:
-- Method 1: Eliminate Daily Energy Drains (fix sleep, reduce inflammatory foods, address deficiencies)
-- Method 2: Build Baseline Strength (3x/week resistance, progressive overload, compound movements)
-- Method 3: Establish Sustainable Nutrition (meal prep, protein targets, vegetable servings)
-
-Finance:
-- Method 1: Stop the Bleeding (eliminate wasteful subscriptions, high-interest debt, impulse purchases)
-- Method 2: Build the Safety Net (automate savings, reach $1000 fund, then 3 months expenses)
-- Method 3: Invest for the Future (max 401k match, open IRA, automatic contributions)
-
-Software Package:
-- Method 1: Nail the Core Use Case (solve one problem extremely well, clear docs, handle errors gracefully)
-- Method 2: Ensure Quality and Stability (comprehensive tests, CI/CD, semantic versioning)
-- Method 3: Build Community and Documentation (contribution guide, examples, responsive to issues)
-
-**Ordering is flexible until it isn't:** After defining all methods, you may realize the ordering is wrong. Swap them. The order represents priority — getting it right matters more than preserving the initial draft.
-
-**Time estimate:** 45-90 minutes (longest section).
-
-### Phase 5.5: Brainstorm tasks for each method
-
-For each method, brainstorm what's missing to achieve it.
-
-**Ask:** "What else would help achieve this method's goal?"
-
-**Your role:**
-- Suggest additional tasks based on the method's aspirational description
-- Consider edge cases and error scenarios
-- Identify automation opportunities
-- Propose monitoring/visibility improvements
-- Challenge if the list feels incomplete (can't reach the goal)
-- Challenge if the list feels bloated (items don't contribute to the goal)
-- Create sub-tasks for items with multiple steps
-- Ensure priorities reflect contribution to the method's goal
-
-**For each brainstormed task:**
-- Describe what it does and why it matters
-- Assign priority based on contribution to the method
-- Add technical details if known
-- Get user agreement before adding
-
-**Priority system (org-mode):**
-- `[#A]` Critical blockers — must be done first, blocks everything else
-- `[#B]` High-impact reliability — directly enables the method goal
-- `[#C]` Quality improvements — valuable but not blocking
-- `[#D]` Nice-to-have — low priority, can defer
-
-**Time estimate:** 10-15 minutes per method (~50-75 min for 5 methods).
-
-### Phase 6: Identify the Obstacles
-
-**Ask:** "What's in your way? What makes this hard?"
-
-**Goal:** Honest, specific obstacles — both personal and technical/external.
-
-**Your role:**
-- Encourage honesty (obstacles are reality, not failures)
-- Help distinguish symptoms from root causes
-- Identify patterns in behavior that create obstacles
-- Acknowledge challenges without judgment
-
-**Good obstacle characteristics:**
-- Honest about personal patterns
-- Specific, not generic
-- Acknowledges both internal and external obstacles
-- States real stakes (not just "might happen")
-
-**Common obstacle categories:**
-- Personal: perfectionism, hard to say no, gets bored, procrastinates
-- Knowledge: missing skills, unclear how to proceed, need to learn
-- External: limited time, limited budget, competing priorities
-- Systemic: environmental constraints, missing tools, dependencies on others
-
-**For each obstacle:** name it clearly, describe how it manifests in this project, acknowledge the stakes (what happens because of it).
-
-**Examples:**
-
-Health:
-- "I get excited about new workout programs and switch before seeing results (pattern: 6 weeks into a program)"
-- "Social events involve food and alcohol — saying no feels awkward and isolating"
-- "When stressed at work, I skip workouts and eat convenient junk food"
-
-Finance:
-- "Viewing budget as restriction rather than freedom — triggers rebellion spending"
-- "FOMO on lifestyle experiences my peers have"
-- "Limited financial literacy — don't understand investing beyond 'put money in account'"
-
-Software:
-- "Perfectionism delays releases — always 'one more feature' before v1.0"
-- "Maintaining documentation feels boring compared to writing features"
-- "Limited time (2-4 hours/week) and competing projects"
-
-**Time estimate:** 15-30 minutes.
-
-### Phase 7: Define the Metrics
-
-**Ask:** "How will you measure success? What numbers tell you if this is working?"
-
-**Goal:** 5-10 metrics — objective, measurable, aligned with vision/values/methods.
-
-**Your role:**
-- Suggest metrics based on vision, values, methods
-- Push for measurable numbers (not "better" — concrete targets)
-- Identify vanity metrics (look good but don't measure real progress)
-- Ensure metrics align with values and methods
-
-**Metric categories:**
-- **Performance** — measurable outcomes of the work
-- **Discipline** — process adherence, consistency, focus
-- **Quality** — standards maintained, sustainability indicators
-
-**Good metric characteristics:**
-- Objective (not subjective opinion)
-- Measurable (can actually collect the data)
-- Actionable (can change behavior to improve it)
-- Aligned with values and methods
-
-**For each metric, capture:** name, current state (if known), target state, how to measure, measurement frequency.
-
-**Examples:**
-
-Health:
-- Resting heart rate: 70 bpm → 60 bpm (daily via fitness tracker)
-- Workout consistency: 3x/week strength training for 12 consecutive weeks
-- Sleep quality: 7+ hours per night 6+ nights per week (sleep tracker)
-- Energy rating: subjective 1-10 scale, target 7+ weekly average
-
-Finance:
-- Emergency fund: $0 → $6000 (monthly)
-- High-interest debt: $8000 → $0 (monthly)
-- Savings rate: 5% → 20% of gross income (monthly)
-- Financial anxiety: weekly check-in, target "comfortable with financial decisions"
-
-Software:
-- Test coverage: 0% → 80% (coverage tool)
-- Issue response time: median < 48 hours (GitHub stats)
-- Documentation completeness: all public APIs documented with examples
-- Adoption: 10+ GitHub stars, 3+ projects depending on it
-
-**Time estimate:** 20-30 minutes.
-
-### Phase 8 (optional): Migrate existing tasks
-
-If there's an existing `TODO.org` or task list, migrate it under the V2MOM methods.
-
-**Goal:** Consolidate all project tasks under V2MOM methods, eliminate duplicates, move non-fitting items to someday-maybe.
-
-**Process:**
-
-1. **Identify duplicates** — read existing TODO, find tasks already in V2MOM methods, check if V2MOM task has all technical details from the TODO version, enhance if needed, mark original for deletion.
-2. **Map tasks to methods** — for each remaining task, ask "which method does this serve?" Add under appropriate method with priority. Preserve task state (DOING, VERIFY, etc.).
-3. **Review someday-maybe candidates one-by-one** — for each task that doesn't fit methods, ask: keep in V2MOM (which method)? Move to someday-maybe? Delete?
-4. **Final steps** — append someday-maybe items to `docs/someday-maybe.org`; copy completed V2MOM to TODO.org (overwriting). V2MOM becomes the single source of truth.
-
-**Keep in V2MOM:** DOING tasks (work in progress), VERIFY tasks (need testing/verification), tasks that enable method goals.
-
-**Move to someday-maybe:** Doesn't directly serve a method's goal; nice-to-have without clear benefit; research task without actionable outcome; architectural change decided not to pursue; unrelated personal task.
-
-**Delete entirely:** Obsolete tasks (feature removed, problem solved elsewhere); duplicate of something done; task that no longer makes sense.
-
-**Review one task at a time** — don't batch. Capture reasoning.
-
-**Time estimate:** Variable — small (~20 tasks) 30-45 min; medium (~50) 60-90 min; large (100+) 2-3 hours.
-
-This phase is optional — only needed if an existing todo list has substantial content.
-
-### Phase 9: Review and refine
-
-Once all sections are complete, review the whole V2MOM together:
-
-1. **Does the vision excite you?** (If not, why not? What's missing?)
-2. **Do the values guide decisions?** (Can you use them to say no to things?)
-3. **Are the methods ordered by priority?** (Is Method 1 truly most important?)
-4. **Are the obstacles honest?** (Or are you sugar-coating?)
-5. **Will the metrics tell you if you're succeeding?** (Or are they vanity metrics?)
-6. **Does this V2MOM make you want to DO THE WORK?** (If not, something is wrong.)
-
-**Refinement:** merge overlapping methods; reorder methods if priorities are wrong; add missing concrete actions; strengthen weak definitions; remove fluff.
-
-**Red flags:**
-- Vision doesn't excite you → Need to dig deeper into what you really want
-- Values are generic → Need concrete definitions and examples
-- Methods have no concrete actions → Too vague, need specifics
-- Obstacles are all external → Need honesty about personal patterns
-- Metrics are subjective → Need objective measurements
-
-### Phase 10: Commit and use
-
-1. Save the document in its appropriate location.
-2. Share with stakeholders (if applicable).
-3. Use it immediately — start Method 1 execution or the first triage.
-4. Schedule first review (1 week out): is this working?
-
-Use immediately to validate the V2MOM is practical, not theoretical. Execution reveals gaps that discussion misses.
-
-## Principles
-
-### Honesty over aspiration
-
-V2MOM requires brutal honesty, especially in Obstacles.
-
-- "I get bored after 6 weeks" (honest) vs "Maintaining focus is challenging" (bland)
-- "I have 3 hours per week max" (honest) vs "Time is limited" (vague)
-- "I impulse-spend when stressed" (honest) vs "Budget adherence needs work" (passive)
-
-**Honesty enables solutions.** If you can't name the obstacle, you can't overcome it.
-
-### Concrete over abstract
-
-Every section should have concrete examples and definitions.
-
-**Bad:** Vision "be successful" · Values "Quality, Speed, Innovation" · Methods "improve things" · Metrics "do better"
-
-**Good:** Vision "Complete a 5K in under 30 min, have energy to play with kids after work, sleep 7+ hours consistently" · Values "Sustainable: can maintain for 10+ years, no crash diets, no injury-risking overtraining" · Methods "Method 1: Fix sleep quality (blackout curtains, consistent bedtime, no screens 1hr before bed)" · Metrics "5K time: current 38 min → target 29 min (measure: monthly timed run)"
-
-### Priority ordering is strategic
-
-Method ordering determines what happens first. Get it wrong and you'll waste effort.
-
-Common patterns:
-- **Fix → Build → Enhance → Sustain** (eliminate problems before building)
-- **Eliminate → Replace → Optimize** (stop damage before improving)
-- **Learn → Practice → Apply → Teach** (build skill progressively)
-
-Method 1 must address the real blocker — if the foundation is broken, nothing built on it will hold; high-impact quick wins build momentum; must stop the bleeding before rehab.
-
-### Methods need concrete actions
-
-If you can't list 3-8 concrete actions for a method, it's too vague.
-
-**Test:** Can you start working on Method 1 immediately after completing the V2MOM? If the answer is "I need to think about what to do first," the method needs more concrete actions.
-
-- Too vague: "Method 1: Improve health"
-- Concrete: "Method 1: Fix sleep quality → blackout curtains, consistent 10pm bedtime, no screens after 9pm, magnesium supplement, sleep tracking"
-
-### Metrics must be measurable
-
-"Better" is not a metric. "Bench press 135 lbs" is a metric.
-
-For each metric, you must be able to answer:
-1. How do I measure this? (exact method or tool)
-2. What's the current state?
-3. What's the target state?
-4. How often do I measure it?
-5. What does this metric actually tell me?
-
-If you can't answer these, it's not a metric yet.
-
-### V2MOM is a living document
-
-Not set in stone. As you execute, expect: method reordering (new info reveals priorities), metric adjustments (too aggressive or too conservative), new obstacles emerging, refined value definitions.
-
-**Update when:** major priority shift occurs; new obstacle emerges that changes approach; metric targets prove unrealistic or too easy; method completion opens new possibilities; quarterly review reveals misalignment.
-
-**But not frivolously:** Changing the V2MOM every week defeats the purpose. Update on major shifts, not minor tactics.
-
-### Use it or lose it
-
-V2MOM only works if you use it for decisions.
-
-Use it for:
-- Weekly reviews (am I working on the right things?)
-- Priority decisions (which method does this serve?)
-- Saying no to distractions (not in the methods)
-- Celebrating wins (shipped Method 1 items)
-- Identifying blockers (obstacles getting worse?)
-
-If 2 weeks pass without referencing the V2MOM, something is wrong — either the V2MOM isn't serving you, or you're not using it.
-
-## Closing Test
-
-Can you say "no" to something you would have said "yes" to before? If so, the V2MOM is working.
diff --git a/finish-branch/SKILL.md b/finish-branch/SKILL.md
deleted file mode 100644
index 0903010..0000000
--- a/finish-branch/SKILL.md
+++ /dev/null
@@ -1,251 +0,0 @@
----
-name: finish-branch
-description: Complete a feature branch with a forced-choice menu of outcomes (merge locally / push + open PR / keep as-is / discard). Runs verification before offering options (tests, lint, typecheck per the project's conventions — delegates to `verification.md`). Requires typed confirmation for destructive deletion (no accidental work loss). Handles git worktree cleanup correctly: tears down for merge and discard, preserves for keep and push (where the worktree may still be needed for follow-up review or fixes). References existing rules for commit conventions (`commits.md`), review discipline (`review-code`), and verification (`verification.md`) — this skill is the workflow scaffold, not a re-teach of the underlying standards. Use when implementation is complete and you need to wrap up a branch. Do NOT use for mid-development merges (that's normal git flow), for the wrap-up *of a whole session* (different scope — session-end is narrative + handoff, not branch integration), for creating a new branch (no skill for that — just `git checkout -b`), or when review hasn't happened yet (run `/review-code` first, then this).
----
-
-# /finish-branch
-
-Complete a development branch cleanly. Verify, pick one of four outcomes, execute, clean up.
-
-## When to Use
-
-- Implementation is done, tests are written, you believe the branch is ready to move forward
-- You're about to ask "okay, now what?" — this skill exists to stop you asking and start you choosing
-- You need to integrate, preserve, or discard a branch's work deliberately
-
-## When NOT to Use
-
-- Mid-development merges — that's regular git flow, not "finishing" a branch
-- Full session wrap-up (closing the day's work, recording context for next time) — different scope, not about branch integration
-- Creating a new branch — just `git checkout -b <name>`
-- Before review has happened on a significant change — run `/review-code` first; this skill assumes the work has been judged ready
-
-## Phase 1 — Verify
-
-Before offering any option, run the project's verification. Delegate to `verification.md` discipline:
-
-- Tests pass (full suite, not just the last one you wrote)
-- Linter clean (no new warnings)
-- Type checker clean (no new errors)
-- Staged diff matches intent (no accidental additions)
-
-Commands vary by stack. Use what the project's Makefile, `package.json` scripts, or convention dictates. Examples:
-
-- JavaScript / TypeScript: `npm test && npm run lint && npx tsc --noEmit`
-- Python: `pytest && ruff check . && pyright`
-- Go: `go test ./... && go vet ./... && golangci-lint run`
-- Elisp: `make test && make validate-parens && make validate-modules`
-
-**If verification fails:** stop. Report the failures with file:line, do not offer the outcome menu. The branch isn't finished — fix the failures and re-invoke `/finish-branch`.
-
-**If verification passes:** continue to Phase 2.
-
-If any verification step fails and triggers sub-investigations, follow `subagents.md` — dispatch a fresh fix-agent per failure with pasted error context rather than retrying in the orchestrator.
-
-## Phase 2 — Determine Base Branch
-
-The four outcomes need to know what the branch is being integrated into. Find the base:
-
-```bash
-git merge-base HEAD main 2>/dev/null \
-  || git merge-base HEAD master 2>/dev/null \
-  || git merge-base HEAD develop 2>/dev/null
-```
-
-If none resolves, ask: "I couldn't find a merge base against `main`, `master`, or `develop`. What's the base branch for this work?" Don't guess.
-
-Also collect:
-- Current branch name: `git branch --show-current`
-- Commit range to integrate: `git log <base>..HEAD --oneline`
-- Unpushed commits on the remote: `git log @{u}..HEAD` if the branch tracks a remote, otherwise all commits are local
-- Whether the current directory is in a git worktree: `git worktree list | grep $(git branch --show-current)`
-
-These details feed the outcome prompt and the execution phases.
-
-## Phase 3 — Offer the Menu
-
-Present exactly these four options. Don't editorialize. Don't add a "recommendation." The user picks.
-
-```
-Branch <branch-name> is ready. It has N commits on top of <base-branch>.
-
-What would you like to do?
-
-1. Merge locally into <base-branch>     (integrate now, delete the branch)
-2. Push and open a Pull Request          (remote integration flow)
-3. Keep as-is                             (preserve branch and worktree for later)
-4. Discard                                (delete branch and commits — requires confirmation)
-
-Pick a number.
-```
-
-If the branch already has a remote and the user chose 1 (merge locally), note: "The branch is pushed to `origin/<branch-name>`. After merging locally, do you also want to delete the remote branch?" Ask; don't assume.
-
-**Stop and wait for the answer.** Don't guess, don't infer from context, don't proceed to Phase 4 until the user picks.
-
-## Phase 4 — Execute the Chosen Outcome
-
-### Option 1 — Merge Locally
-
-```bash
-git checkout <base-branch>
-git pull                              # ensure base is current
-git merge --no-ff <branch-name>       # --no-ff preserves the branch point in history
-```
-
-**Re-verify the merged result.** Run tests / lint / type check on the merged state — the merge may have integrated cleanly at the text level while breaking semantically.
-
-If verification passes:
-```bash
-git branch -d <branch-name>           # safe delete (refuses unmerged branches, which this one is merged)
-```
-
-If the branch had a remote:
-- If user confirmed removing the remote: `git push origin --delete <branch-name>`
-- Otherwise: leave the remote branch, note that the user should clean it up manually
-
-Clean up the worktree (Phase 5).
-
-### Option 2 — Push and Open a Pull Request
-
-```bash
-git push -u origin <branch-name>      # -u sets upstream on first push
-```
-
-Open the PR. Use the project's `gh` CLI (install via `deps` target if missing):
-
-```bash
-gh pr create \
-  --base <base-branch> \
-  --title "<subject from the most recent commit, or user-provided>" \
-  --body "$(cat <<'EOF'
-## Summary
-
-<two or three bullets summarizing what changed, pulled from the commit range>
-
-## Test Plan
-
-- [ ] <steps the reviewer should take to verify>
-EOF
-)"
-```
-
-**Commit message and PR body discipline:** no AI attribution, no "🤖 Generated with" footer, conventional message style — see `commits.md`. If the project has a `.github/pull_request_template.md`, use it instead of the template above.
-
-**Do NOT clean up the worktree.** The branch is not yet merged; you may need the worktree for reviewer feedback, fixes, or rebase. (Phase 5 table.)
-
-### Option 3 — Keep As-Is
-
-No git state changes. Report:
-
-```
-Keeping branch <branch-name> as-is.
-Worktree preserved at <worktree-path> (or "same working directory" if not a worktree).
-Resume later with `git checkout <branch-name>` or re-invoke `/finish-branch`.
-```
-
-**Do NOT clean up the worktree.** The user explicitly wants to come back.
-
-### Option 4 — Discard
-
-**Confirmation gate — required.** Write out what will be permanently lost:
-
-```
-Discarding will permanently delete:
-
-- Branch: <branch-name>
-- Commits that exist only on this branch (N commits):
-    <list, abbreviated if very long>
-- Worktree at <worktree-path> (if applicable)
-- Remote branch origin/<branch-name> (if it exists)
-
-This cannot be undone via `git checkout` — only via the reflog (≤30 days by default).
-
-To proceed, type exactly: discard
-```
-
-**Wait for the user to type the literal word `discard`.** Anything else — "yes," "y," "confirm," a number — does not qualify. Re-prompt.
-
-If confirmed:
-
-```bash
-git checkout <base-branch>
-git branch -D <branch-name>                           # force delete
-git push origin --delete <branch-name> 2>/dev/null    # delete remote if it exists; ignore error if not
-```
-
-Clean up the worktree (Phase 5).
-
-## Phase 5 — Worktree Cleanup
-
-| Option | Cleanup worktree? |
-|---|---|
-| 1. Merge locally | **Yes** |
-| 2. Push + PR | **No** (may still be needed for review feedback) |
-| 3. Keep as-is | **No** (user explicitly wants it) |
-| 4. Discard | **Yes** |
-
-**If cleanup applies:**
-
-```bash
-git worktree list                                     # confirm you're in a worktree
-git worktree remove <worktree-path>                   # non-destructive if clean
-```
-
-If `git worktree remove` refuses (unclean state somehow): surface the reason to the user. Don't force removal without their consent — a dirty worktree may contain work they intended to rescue.
-
-**If no cleanup:** done. Report final state.
-
-## Output
-
-Short final report, not a celebration:
-
-```
-Branch <branch-name>:
-  - Outcome: <1 | 2 | 3 | 4>
-  - <specific state change, e.g. "merged into main; branch deleted; worktree removed">
-  - Next: <what the user would do next — e.g. "await PR review", "resume work", "start a new branch">
-```
-
-No emojis. No "🎉 all done!" No AI attribution. See `commits.md`.
-
-## Critical Rules
-
-**DO:**
-- Run verification before offering the outcome menu. No exceptions.
-- Present exactly four options, clearly labeled. The forced choice is the point.
-- Require the literal word `discard` for Option 4.
-- Re-verify after a merge (Option 1) — merges can integrate textually while breaking semantically.
-- Clean up worktrees only for Options 1 and 4.
-
-**DON'T:**
-- Offer options with failing verification — the branch isn't finished.
-- Editorialize the menu ("you should probably do option 2"). The user picks.
-- Accept "y" or "yes" for the discard gate. Literal word `discard`.
-- Clean up worktrees after Option 2 or 3 — the user needs them.
-- Add AI attribution to commit messages, PR descriptions, or output.
-
-## Common Mistakes This Prevents
-
-- **Open-ended "what now?" at the end of work.** Natural but corrosive — the user either has to improvise the workflow or restate their preference each time. The four-option menu ends the improvisation.
-- **Accidental destructive deletes.** "Discard this branch?" → "y" → 3 days of work gone. The typed-word gate turns one muscle-memory keystroke into a deliberate action.
-- **Merge-then-oops.** Text-level merge completes; semantic integration is broken; the user didn't notice because they didn't re-run the tests on the merged result. Phase 4 Option 1 re-verifies.
-- **Worktree amnesia.** Cleaning up a worktree after Option 2 (push + PR) means losing local state just when the reviewer asks for a fix. The cleanup matrix keeps the worktree exactly when it's still needed.
-- **"Generated by Claude Code" trailing into a PR.** The no-attribution rule from `commits.md` applies here too — the PR body is committed content under the project's identity, not Claude's.
-
-## Integration with Other Skills
-
-**Before this skill:**
-- `/review-code` — run a review on the branch before finishing it for significant changes
-- `/arch-evaluate` — if the branch touches architecture, audit against `.architecture/brief.md`
-
-**After this skill:**
-- If Option 2 (PR opened): reviewer feedback comes in → address → `/finish-branch` again on updated state
-- If Option 1 (merged locally): branch is done; if this closes a ticket / ADR, update tracking
-- If Option 3 (kept): resume later; re-invoke when ready
-- If Option 4 (discarded): often paired with `/brainstorm` or `/arch-design` to retry the problem differently
-
-**Companion rules (not skills) this skill defers to rather than re-teaching:**
-- Verification discipline → `verification.md`
-- Commit message conventions + no AI attribution → `commits.md`
-- Review discipline (for anything pre-merge) → `review-code`
diff --git a/prompt-engineering/SKILL.md b/prompt-engineering/SKILL.md
deleted file mode 100644
index d566286..0000000
--- a/prompt-engineering/SKILL.md
+++ /dev/null
@@ -1,228 +0,0 @@
----
-name: prompt-engineering
-description: Craft prompts (commands, hooks, skill descriptions, sub-agent instructions, system prompts, one-shot requests to other LLMs) that do what they're meant to and resist common failure modes. Covers four moves that determine whether a prompt holds up: classify the prompt type (discipline-enforcing / guidance / collaborative / reference) to pick the right tone and techniques; apply the persuasion framework appropriate to that type (seven principles from Meincke et al. 2025, including which to avoid — notably Liking, which breeds sycophancy); match task fragility to degrees of freedom (high/medium/low); and spend the context window like a shared resource. Also contains a brief reference for classical techniques (few-shot, chain-of-thought, system prompts, templates). Use both in design mode (asking for help writing a new prompt from scratch) and critique mode (paste a draft, get it rewritten to resist common failure modes). Do NOT use for prose editing unrelated to LLM prompts (use a writing skill), for implementing application code that uses an LLM (different scope), or for content moderation / prompt-injection defense (adjacent but separate domain).
----
-
-# Prompt Engineering
-
-Every command, hook, skill description, sub-agent instruction, and system prompt is a prompt. This skill helps make them resist the failures that chew through engineering time: sycophantic sub-agents, vague rules agents rationalize around, verbose instructions that cost tokens without changing behavior, trigger descriptions that don't actually trigger.
-
-Two modes of use:
-
-- **Design mode** — you need a new prompt. Ask what you're building, and this skill walks through type-classification and technique selection.
-- **Critique mode** — you have a draft. Paste it, and this skill identifies the prompt type, flags failure risks, and rewrites.
-
-## When to Use
-
-- Writing a new Claude Code command, hook, skill SKILL.md, or sub-agent instruction
-- Writing a system prompt for any LLM interaction
-- Debugging a prompt that's misbehaving (sub-agent drifts, skill under-triggers, rule gets rationalized around)
-- Reviewing someone else's prompt before it ships
-
-## When NOT to Use
-
-- General prose editing unrelated to LLM prompts (use a writing skill)
-- Implementation code that calls an LLM (different scope — covered by language skills)
-- Prompt-injection defense / content moderation (adjacent; separate discipline)
-- Casual single-turn requests where a prompt's behavior isn't production-critical
-
-## The First Move: Classify the Prompt Type
-
-Prompts fail in different ways depending on what they're for. Before writing technique, name the type. Four categories:
-
-### 1. Discipline-enforcing
-
-**What it is:** The prompt must produce the *same* behavior every time, under any condition. No judgment calls, no rationalization.
-
-**When you have one:** Security checks, deployment gates, safety rules, pre-commit hooks, FedRAMP-style compliance verifiers, PII scanners, rate-limit enforcers.
-
-**Failure mode:** The model finds reasons to deviate ("but in this case, maybe..."). Soft verbs like "should try" or "generally" are rationalization fuel.
-
-**Success criteria:** Identical output on edge cases. The model never "makes an exception."
-
-### 2. Guidance / technique
-
-**What it is:** The prompt teaches a technique and expects the model to apply it with judgment. Outcome varies with context.
-
-**When you have one:** Refactoring patterns, debugging workflows, test-writing guidance, architecture advice, code review checklists.
-
-**Failure mode:** Either over-constrained (turns judgment into cargo-culting) or under-constrained (model freelances and misses the pattern).
-
-**Success criteria:** Technique applied correctly across varied inputs; model can explain why it deviated when it did.
-
-### 3. Collaborative
-
-**What it is:** Peer-level work. Review, brainstorm, pair-editing, technical feedback. The user wants an honest second opinion, not a cheerleader.
-
-**When you have one:** Code review agents, design-review agents, editing help, technical critique sub-agents, brainstorming partners.
-
-**Failure mode:** **Sycophancy.** Soft-pedaling real issues because the default LLM tone is agreeable. This is the Liking principle firing by default, and for collaborative prompts it's usually *bad*.
-
-**Success criteria:** Honest assessment. Willing to say "this is wrong" or "there are no issues." Structured output to prevent hand-waving.
-
-### 4. Reference
-
-**What it is:** Metadata, index content, trigger descriptions, glossaries, API references. The prompt's job is to be findable and clear, not to change behavior.
-
-**When you have one:** Skill frontmatter descriptions, API endpoint docs, keyword lists for search, glossary entries.
-
-**Failure mode:** Under-triggering (skill never fires) or ambiguity (reader can't tell scope).
-
-**Success criteria:** Triggers reliably on the intended phrases; explicit about what's in and out of scope.
-
-## The Persuasion Framework
-
-LLMs are parahuman — they were trained on human text that's full of persuasion signals, and they respond to those same signals. Use this deliberately, not unconsciously.
-
-### The Seven Principles
-
-Adapted from Meincke et al., *Persuasion and Compliance in Large Language Models* (2025, N≈28,000 conversations). Two-principle combinations shifted compliance rates from ~33% to ~72%.
-
-- **Authority** — Non-negotiable framing. "YOU MUST", "Never", "Always", "No exceptions."
-- **Commitment** — Force explicit action. "Announce the rule you're applying before applying it." "Output your checklist with each item checked."
-- **Scarcity** — Time-bound or sequence-bound. "Before proceeding." "Immediately after X." "First, then."
-- **Social Proof** — Universal pattern language. "Every time." "The failure mode is always…". "Skipping this step = bug in production."
-- **Unity** — Collaborative framing. "We're colleagues." "Our codebase." "Let's work through this."
-- **Reciprocity** — Exchanges of effort. Rarely needed for prompts; often reads as manipulative.
-- **Liking** — Warmth and agreement. **Creates sycophancy.** Usually actively harmful for work where you need honest assessment.
-
-### Which Principles By Prompt Type
-
-| Prompt Type | Use | Avoid |
-|---|---|---|
-| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
-| Guidance | Moderate Authority + Unity | Heavy Authority (feels dogmatic); Liking |
-| Collaborative | Unity + Commitment | Authority (hostile), **Liking (sycophancy)** |
-| Reference | Clarity alone — minimize persuasion | All persuasion framings |
-
-### Why This Matters
-
-- **Bright-line rules reduce rationalization.** "Never commit without running tests" leaves no exception space. "Try to run tests before committing" invites "well, this is just a typo fix, surely…"
-- **Implementation intentions create automatic behavior.** "When you see X, do Y" beats "generally, do Y where appropriate."
-- **Liking is the default LLM tone.** If you don't actively counteract it on collaborative prompts, you'll get agreement bias.
-
-## Degrees of Freedom
-
-Match prompt specificity to how much the task tolerates variation.
-
-- **Low freedom** — exact script, exact format, no deviation. Use when consistency is the whole point: deployment commands, migration scripts, checksum comparisons. "Run `python scripts/migrate.py --verify --backup`" not "run the migration script."
-- **Medium freedom** — parameterized pattern or pseudocode skeleton. The model fills in pieces within a named structure. Most common. Use for most skills and commands.
-- **High freedom** — open instructions, multiple valid approaches. Use when context drives the answer and you trust the model to navigate. Code review, brainstorming, exploratory debugging.
-
-Mismatches to avoid:
-
-- High-freedom prompt for a low-tolerance task (deployment, security, crypto): the model improvises; production breaks
-- Low-freedom prompt for an open task (code review, design): the model produces identical output regardless of context; signal is lost
-
-## Context Window Economics
-
-The context window is a shared resource. Every token spent on verbose instructions is a token not available for the actual work or for other loaded context.
-
-Questions to ask of each sentence in a prompt:
-
-- Does the model need this? Or can it be assumed?
-- Is this stable enough to go in the system prompt, freeing up the user-message budget?
-- Does this paragraph justify its token cost, or is it restating what a nearby paragraph already said?
-- Would a short example convey what three sentences of description do?
-
-**Progressive disclosure for skills specifically:** Skill metadata (name + description) is always in context; the SKILL.md body loads only when the skill triggers; referenced files load only when needed. Put triggering content in the frontmatter, procedural content in the body, exhaustive reference material in linked files.
-
-## Classical Techniques — Brief Reference
-
-Covered extensively elsewhere (Anthropic docs, OpenAI cookbook, the prompt-engineering literature). Short pointers:
-
-- **Few-shot learning** — Include 2-5 input/output examples rather than describing the pattern in prose. Best for consistent formatting, edge case handling, subtle behaviors that are easier to show than tell.
-- **Chain-of-thought** — Request step-by-step reasoning before the answer. Add "think through this step by step" for zero-shot; include a worked reasoning trace for few-shot. Best for multi-step logic; significant accuracy gains on analytical tasks.
-- **System prompts** — For stable role, constraints, and output format that persist across turns. Keep user messages for the varying task content.
-- **Templates** — Parameterize prompts with variables for reuse. Version-control them as code.
-
-These are well-documented; don't re-teach here. Lean toward the four moves (type, persuasion, freedom, economics) — those are what most prompt work actually needs.
-
-## Design Mode Workflow
-
-When asked to design a prompt from scratch:
-
-1. **Classify the type.** Ask the user to pick one, explaining each:
-
-   > - **Discipline-enforcing** — must behave identically every time (security, deployment, safety)
-   > - **Guidance** — teaches a technique the model applies with judgment (refactoring, debugging)
-   > - **Collaborative** — peer work; honest assessment matters more than agreement (review, critique)
-   > - **Reference** — metadata, index, trigger description (skill frontmatter, glossary)
-
-2. **Confirm success criteria.** How will the user know the prompt works? One measurable sentence.
-
-3. **Pick persuasion principles** from the by-type matrix. Name them to the user so the framing is deliberate.
-
-4. **Match degrees of freedom** to task fragility. Explain the choice.
-
-5. **Draft and review.** Write the prompt; walk the user through the choices. Short draft beats long. Bold imperative beats hedging.
-
-6. **Apply the ethics test** (see below). If the prompt would embarrass the user if its recipient understood the techniques being used, revise.
-
-## Critique Mode Workflow
-
-When handed an existing draft:
-
-1. **Classify what type it actually is** — often different from what the user thinks it is. A "review" agent phrased in soft Liking language is a collaborative prompt failing at sycophancy.
-2. **Audit for principle mismatches.** Is a collaborative prompt using Authority? Is a discipline prompt hedging? Is Liking sneaking into a review prompt?
-3. **Check degrees of freedom** against task fragility.
-4. **Check token economy.** Redundancy, restated instructions, unnecessary preamble.
-5. **Check for footguns** from the anti-patterns list.
-6. **Rewrite** — show before/after. Name the changes.
-
-## Ethics Test
-
-Use persuasion to make prompts reliable, not to manipulate. Before shipping any prompt that uses strong Authority, Commitment, or Social Proof framing:
-
-> If the person being persuaded (the LLM, yes, but also any human user downstream of its output) fully understood the techniques in use, would they still consent to the behavior being requested?
-
-If the answer is "no" or "I'd rather they didn't think about it" — revise. Legitimate uses: preventing security slips, enforcing reliability, protecting the user's genuine interests. Illegitimate: extracting consent, bypassing reasonable caution, guilt-inducing framing to force output.
-
-Don't confuse discipline with manipulation. Discipline serves the user; manipulation serves the prompt author's convenience at the user's expense.
-
-## Anti-Patterns
-
-- **Liking by default on a collaborative prompt.** Soft, agreeable language in a review or critique agent produces "looks good to me" theatre. Use Unity + Commitment; strip Liking explicitly.
-- **Hedging on a discipline-enforcing prompt.** "Please try to" and "generally" are loopholes. If the rule is absolute, use absolute language.
-- **Authority stack on a guidance prompt.** All-caps MUSTs on a technique skill feel dogmatic and cargo-cult; the model obeys surface form while missing the idea.
-- **High freedom on a fragile task.** "Deploy the service" invites improvisation; "run `deploy.sh --env prod --verify`" doesn't.
-- **Restating the instruction in prose.** If a sentence is repeating what the previous one said, cut one of them.
-- **Example pollution.** 12 few-shot examples when 3 would do. The model generalizes better from fewer good examples than from many with subtle inconsistencies.
-- **Instruction buried below prose.** Put the action up front. "Read this and then do X" gets the action; "Here's some context about Y, and also we want Z, and consider that…" buries it.
-- **Ambiguous scope in reference prompts.** Trigger descriptions without explicit *do-not-trigger* conditions under-fire and mis-fire.
-- **Persuasion without purpose.** Adding "YOU MUST" or "Every time" because it sounds rigorous — if the task tolerates variance, strong framing just hardens the wrong behaviors.
-
-## Review Checklist
-
-Before declaring a prompt done:
-
-- [ ] The type is classified (discipline / guidance / collaborative / reference)
-- [ ] The persuasion principles match the type (and Liking is explicitly excluded for collaborative)
-- [ ] Degrees of freedom match the task's fragility
-- [ ] Each sentence earns its tokens
-- [ ] Trigger phrases and (for reference prompts) explicit negative conditions are present
-- [ ] Sycophancy traps (for collaborative) and rationalization loopholes (for discipline) are closed
-- [ ] The ethics test passes
-
-## Output
-
-Return the drafted or critiqued prompt, with a short note explaining:
-
-1. Prompt type you identified
-2. Persuasion principles applied (and which you explicitly avoided)
-3. Degrees-of-freedom level chosen
-4. One or two things the revision improved
-
-No long explanations. The prompt itself should demonstrate the principles, not narrate them.
-
-## Related Skills
-
-- **`codify`** — after shipping a prompt that worked well, capture the pattern into `CLAUDE.md`
-- **`brainstorm`** — when the *shape* of the prompt isn't clear yet, run brainstorm first to nail down the requirement, then return here for the prompt itself
-- **`arch-decide`** — if a prompt-design choice is significant enough to document (e.g., "we standardize on Authority + Commitment for all deployment prompts"), record it as an ADR
-
-## References
-
-- Meincke, L., et al. (2025). *Persuasion and Compliance in Large Language Models.* N≈28,000 AI conversations; 7 persuasion principles; compliance shifts of ~2x with appropriate combinations.
-- Anthropic prompt engineering guidance — context window as shared resource; progressive disclosure; degrees-of-freedom framing.
-- Classical prompt engineering literature (few-shot, CoT, system prompts) — assumed background; not re-taught here.
diff --git a/respond-to-cj-comments/SKILL.md b/respond-to-cj-comments/SKILL.md
deleted file mode 100644
index 61cf533..0000000
--- a/respond-to-cj-comments/SKILL.md
+++ /dev/null
@@ -1,232 +0,0 @@
-# /respond-to-cj-comments — Process `cj:` Annotations in a File
-
-Scan a file for `cj:` comments (Craig's in-line instructions and questions) and handle each one with subagent-delegated accuracy. Used to batch-process notes Craig leaves in documents, code, or org files for later action.
-
-## Usage
-
-```
-/respond-to-cj-comments [FILE_PATH]
-```
-
-If no file is given, ask the user for the target file. If the user references "this file" or similar with a visible editor buffer, confirm the path before starting.
-
-## What counts as a `cj:` comment
-
-A `cj:` comment is any line where the text `cj:` (case-insensitive) starts the comment content. The comment marker varies by file type:
-
-| File type                                      | Example                                                      |
-|------------------------------------------------|--------------------------------------------------------------|
-| Org / plain text                               | `cj: what does this section actually deliver?`               |
-| Markdown                                       | `<!-- cj: split this paragraph before the Scope section -->` |
-| Python / shell / YAML                          | `# cj: check whether this still matches reality`             |
-| JavaScript / C / Java / TypeScript / Rust / Go | `// cj: verify the error wrapping logic here`                |
-| Lisp / Elisp / Scheme                          | `;; cj: is this hook still wired?`                           |
-| LaTeX                                          | `% cj: rewrite, this sounds too corporate`                   |
-| HTML / XML                                     | `<!-- cj: move this above the fold -->`                      |
-
-Multi-line comments are supported when continuation lines keep the same comment prefix:
-
-```
-# cj: please check whether the feature flag is still wired.
-# also confirm the fallback path matches the diagram in docs/arch.org.
-```
-
-Treat the whole contiguous block as one `cj:` item.
-
-## Instructions
-
-### 1. Scan the file
-
-Read the target file and collect every `cj:` comment. For each one, record:
-
-- File path and line number (or range for multi-line items)
-- Enclosing context:
-  - *Org file:* the parent `* TODO` / heading path (e.g. `Parent ▸ Subheading ▸ TODO Foo`)
-  - *Code:* the surrounding function or class
-  - *Other:* a few lines of surrounding prose
-- The full comment text, including continuation lines
-- The raw comment-marker style so the line can be located and removed later
-
-### 2. Classify each item
-
-For each `cj:` comment, decide whether it's:
-
-- **Instruction** — a request for action. Signals: imperative verbs (check, rewrite, fix, remove, add, draft), "please do X".
-- **Question** — a request for information. Signals: ends with `?`, starts with who/what/when/where/why/how, "is this", "does this", "should we".
-- **Both** — contains an instruction and a question. Handle the instruction side, answer the question side.
-
-When truly ambiguous, treat as both.
-
-### 3. Delegate to subagents — accuracy over speed
-
-For each non-trivial item, spawn an Agent subagent. Trivial items (e.g. "remove this blank line") the main thread can handle directly without delegation.
-
-Two classes of subagent:
-
-- **Instruction subagent** — given the comment, the enclosing context, and the file path, figures out what change is needed and reports a concrete recommendation. Reports: what should change, file paths + line numbers, any follow-up the user needs to decide.
-- **Question subagent** — researches the question. For questions requiring codebase exploration, ticket lookups, web research, or cross-file reasoning, give the subagent explicit scope (files to read, tickets to check, search terms). Reports: direct answer, evidence (file:line refs, quotes, sources), confidence level, any remaining unknowns.
-
-Run subagents in parallel when the items are independent. Wait for review between batches if items depend on each other.
-
-Prompt every subagent with four required fields (see `subagents.md` for the full contract):
-
-1. **Scope** — one bounded comment, named file, specific action or question
-2. **Context** — paste the comment verbatim plus the surrounding context from step 1
-3. **Constraints** — do not touch other `cj:` comments, do not refactor unrelated code, preserve file formatting
-4. **Output format** — for instructions: a specific change proposal (before/after or file:line patch), plus URLs or file paths for every external source consulted. For questions: answer + evidence (including URLs or file:line refs for every claim) + confidence level + under 300 words. Subagents must cite sources. A claim without a URL or file reference doesn't belong in the report.
-
-The main thread applies edits. Subagents report; they do not write to the source file. This keeps formatting consistent and makes review easier.
-
-**Accuracy over speed is the rule.** Three subagents and fifteen minutes to answer one question correctly beats a fast wrong answer.
-
-### 4. Apply changes
-
-For **instructions**:
-
-1. Review the subagent's proposed change before applying. Check that it matches the comment's intent and doesn't introduce a new issue. If the proposal looks wrong or incomplete, dispatch a fix subagent with the failure report as its context. Don't retry the diagnosis in the main thread (see the Context-Pollution Rule in `subagents.md`). After two failed fix attempts, stop and surface the problem to the user.
-2. Apply the change once the proposal is sound. Edits come from the main thread.
-3. **Org-mode subheader format.** When inserting new content under an existing org-mode task (research findings, drafted messages, log entries, recorded Slack replies), use a timestamped subheader one level deeper than the parent task:
-
-   ```
-   *** 2026-04-23 Thu @ 15:20:52 -0500 <short description>
-   <content>
-   ```
-
-   Use one more `*` than the parent task's heading level. If the parent is `** TODO`, the subheader is `***`. If the parent is `**** TODO`, the subheader is `*****`. Generate the timestamp with `date "+%Y-%m-%d %a @ %H:%M:%S %z"` so it's accurate, not estimated.
-
-4. **Surface URLs.** If the subagent's output includes URLs, file paths, or external references, list them under the updated task content. After applying the edit, offer to convert them into explicit org-mode links (`[[url][label]]` or `[[file:path][label]]`) at the location where the `cj:` comment originated, or elsewhere in the task if that fits better.
-
-5. If the comment is inside an org-mode `TODO` heading, mark that `TODO` as `DOING` when work begins. Leave it `DOING` for the user to advance to `DONE` after review.
-
-6. Remove the `cj:` comment from the file (the entire contiguous block, including continuation lines).
-
-For **questions**:
-
-1. Capture the answer in the summary (step 5).
-2. Remove the `cj:` comment. The user can re-open the thread conversationally if they have follow-ups.
-
-For **writing destined for public channels** (commit messages, PR descriptions, PR comments, Slack or email messages, public docs):
-
-1. Invoke `/humanizer` on the draft.
-2. Apply these voice and attribution rules to the writing you produced (adapted from `/home/cjennings/code/rulesets/claude-rules/commits.md`):
-
-   **No AI attribution, anywhere.** Never include AI, LLM, Claude, or Anthropic attribution in any public-facing text. That means:
-   - No `Co-Authored-By: Claude` (or Claude Code, or any AI) trailers
-   - No "Generated with Claude Code" footers or equivalents
-   - No emojis implying AI authorship
-   - No references to Claude, Anthropic, LLM, or "AI tool" as a credited contributor
-   - Strip any attribution a tool or template inserts by default
-
-   **First person where it fits.** When the subject is Craig or a decision he made, use "I". When the subject is a team decision or shared rationale, "we" fits. Name other authors when crediting their work ("Kostya's PR #116 did X"). Avoid third-person press-release voice ("This PR introduces X", "This change restores Y") unless the code or system is the actor.
-
-   **Brief. Terse is fine.** A one-sentence body beats a paragraph saying the same thing. If the subject line covers it, skip the body entirely. Cut every clause that restates what the diff, the card, or the surrounding context already shows. Length isn't a proxy for care. Rhetorical padding ("worth noting", "it's important to understand") always comes out.
-
-   **Kind.** Text directed at a specific person: acknowledge them when it fits, without pouring it on. Frame disagreement as your read ("I think...", "my read was...", "did you mean X?"). Leave room for the other person to have seen something you didn't. A polite question beats a defensive explanation.
-
-   **Plain English.** Complete sentences a low- or mid-level engineer who isn't a native speaker can read without stumbling. Replace semicolons with periods or commas. Prefer contractions ("it's", "that's", "don't", "we're"). Split long sentences on conjunctions ("so", "and", "but") when meaning survives the split. No em-dashes, use regular dashes instead. Em-dashes break in GitHub, Linear, and Slack.
-
-   **Don't stack jargon.** A sentence that chains three or more type signatures, API names, or compiler concepts reads as a wall. Break it into shorter sentences. Keep the terms a reader will grep for, drop the ones that name compiler internals.
-
-3. Scan the output for AI-attribution tells. If you catch yourself having written any of these, stop, delete, and rewrite:
-   - `Co-Authored-By: Claude`
-   - `Generated with Claude Code` (with or without the robot emoji)
-   - "Created with Claude Code"
-   - "Assisted by AI"
-
-   Rewrite as Craig would write it: concise, focused on the content, with no mention of how the text was produced.
-
-If the writing will be *posted* (not just saved as a draft), follow the Review and Publish flow in `commits.md` — draft to `/tmp/`, open in `emacsclient -n`, wait for Craig's explicit approval before posting.
-
-**Private writing** (todo.org entries, session-context.org, scratch notes, internal rulesets) skips the humanizer pass. Voice rules still apply where relevant, but the overhead of a draft-and-approve cycle is not warranted.
-
-### 5. Report
-
-Produce one summary at the end, structured:
-
-```
-## Summary
-
-### Instructions (N)
-
-1. <file:line> <one-line recap>
-   → done. <brief what-changed>.
-   → TODO status: <DOING | unchanged>.
-2. ...
-
-### Questions (N)
-
-1. <file:line> Q: <comment verbatim, trimmed to one line>
-   A: <direct answer>
-   Evidence: <file:line refs, ticket IDs, URLs, quotes>
-   Confidence: <high | medium | low with caveat>
-2. ...
-
-### Follow-ups
-
-- <anything the user needs to decide, review, or act on>
-
-### Unresolved
-
-- <any `cj:` comments that couldn't be handled — say why, leave them in place>
-
-### File state
-
-State explicitly one of:
-- "File is now clean. All `cj:` comments resolved and removed."
-- "File still contains N unresolved `cj:` comment(s), listed under Unresolved above."
-
-Never leave the reader guessing about whether the file is ready for follow-up work.
-```
-
-Keep answers direct. If a question has a simple answer, one sentence. If it needs nuance, two or three. Do not pad. The user reads the summary, not the intermediate work.
-
-**Long summaries go to a file.** If the summary runs beyond six to eight items, or beyond ~500 words of prose, write it to `/tmp/respond-to-cj-summary-YYYY-MM-DD.org` as an org file and open it in `emacsclient -n <path>`. The user may annotate the file, add new `cj:` comments in-line, or answer open questions in-line, then save. After the user saves, re-read the file for any additional instructions or answers and handle those before ending the run. A long summary dumped straight into chat is hard to comment on and hard to return to; the file form makes it reviewable and iterable.
-
-**Every file path in the summary must be a clickable org-mode link.** Use absolute paths: `[[file:/absolute/path/to/file][display label]]`. Plain paths wrapped in `=verbatim=` markers aren't clickable in emacs. URLs in the summary should also use link form: `[[https://...][label]]`. This applies to every file reference in the summary file — source files read, artifacts created, archive destinations, anywhere the user might want to jump. If there's a path in the summary, it should open when clicked.
-
-### 6. Cleanup pass
-
-Remove every `cj:` comment that was handled in step 4 (instructions done) or step 5 (questions answered). The file is clean after the skill runs. Any comment left unresolved stays in place and is listed under `### Unresolved` in the summary with the reason.
-
-Confirm the cleanup by re-scanning the file after removals. If any `cj:` line survives that shouldn't, remove it and note the correction.
-
-## Principles
-
-- **Accuracy > speed.** Subagent generously. A wrong answer to a `cj:` comment is worse than a slow one.
-- **Don't guess.** If a question needs verification and verification isn't possible, say so. Surface unknowns rather than fabricate.
-- **Preserve the file.** Don't reformat surrounding lines. Don't reorder tasks. Touch only what the comment asks for, plus the comment's own removal.
-- **The user reads the summary, not the process log.** Write summaries that are directly useful.
-- **Public writing gets humanizer + commits.md. Private writing doesn't.** Don't over-process internal notes.
-
-## Anti-patterns
-
-- Handling complex `cj:` items inline in the main thread instead of subagenting.
-- Batching unrelated `cj:` comments into one giant subagent prompt.
-- Removing a `cj:` comment before the user has seen the answer in the summary.
-- Skipping the humanizer + commits.md pass on public-facing writing because it "looked fine already."
-- Guessing on a question instead of spawning a research subagent.
-- Letting a subagent edit the source file directly — review surface loss.
-
-## Example
-
-Input file: `todo.org` containing:
-
-```
-** TODO Draft the D2P2 Pillar 1 writeup
-cj: what do we actually know about Orion's Belt's partner API contract?
-cj: check whether Redwire's AdTech feed is still in scope. Eric mentioned deprecating it in sprint planning.
-```
-
-Skill run:
-
-1. Scans, finds two `cj:` items (one question, one instruction-with-verification).
-2. Marks the parent `** TODO` as `DOING`.
-3. Spawns two subagents in parallel:
-   - **Q subagent:** reads `deepsat/knowledge.org`, greps ticket history for "Orion's Belt API", checks recent meeting transcripts. Reports findings with evidence.
-   - **Instruction-with-verification subagent:** reads sprint-planning transcripts, searches for "Redwire AdTech deprecation", checks Linear for related tickets. Reports whether the feed is still in scope.
-4. Main thread applies the `DOING` status change, writes the summary with both answers, removes both `cj:` comments.
-5. Reports:
-   - Q1: What we know about Orion's Belt's partner API contract → answer + file:line refs + confidence.
-   - Q2/Instruction: Redwire AdTech scope → answer + evidence + recommendation for whether to cite in Pillar 1.
-
-Craig reads the summary, decides what to do with the Redwire finding, and the TODO stays in `DOING` until he advances it.
diff --git a/respond-to-review/SKILL.md b/respond-to-review/SKILL.md
deleted file mode 100644
index 3508fa5..0000000
--- a/respond-to-review/SKILL.md
+++ /dev/null
@@ -1,54 +0,0 @@
-# /respond-to-review — Evaluate and Implement Code Review Feedback
-
-Evaluate code review feedback technically before implementing. Verify suggestions against the actual codebase — don't implement blindly.
-
-## Usage
-
-```
-/respond-to-review [PR_NUMBER]
-```
-
-If no PR number is given, check the current branch's open PR for pending review comments.
-
-## Instructions
-
-### 1. Gather the Feedback
-
-- Fetch review comments using `gh api repos/{owner}/{repo}/pulls/{number}/comments` (for inline review comments) and `gh api repos/{owner}/{repo}/issues/{number}/comments` (for top-level PR conversation comments)
-- Read each comment in full. Group related comments — reviewers often raise connected issues across multiple comments.
-
-### 2. Evaluate Each Item
-
-For each review comment, before implementing anything:
-
-1. **Restate the suggestion in your own words** — make sure you understand what's being asked. If you can't restate it clearly, ask for clarification before touching code.
-2. **Verify against the codebase** — check whether the reviewer's premise is correct. Reviewers sometimes misread code, miss context, or reference outdated patterns. Read the actual code they're commenting on.
-3. **Check YAGNI** — if the reviewer suggests a "more proper" or "more robust" implementation, grep the codebase for actual usage. If nothing uses the suggested pattern today, flag it: "This would add complexity without current consumers. Should we defer until there's a concrete need?"
-4. **Assess the suggestion** — categorize as:
-   - **Correct and actionable** — implement it
-   - **Correct but out of scope** — acknowledge and create a follow-up issue
-   - **Debatable** — present your reasoning and ask for the reviewer's perspective
-   - **Incorrect** — explain why with evidence (file paths, test results, documentation)
-
-### 3. Respond
-
-- Lead with technical substance, not agreement
-- If you disagree, explain why with code references — not opinion
-- If you're unsure, say so and ask a specific question
-- Never implement a suggestion you don't understand
-
-### 4. Implement
-
-- Address simple fixes first, complex ones after
-- Test each change individually — don't batch unrelated fixes into one commit
-- Run the full test suite after all changes
-- Commit with conventional messages referencing the review: `fix: Address review — [description]`
-
-### 5. Report
-
-- Summarize what was implemented, what was deferred, and what needs discussion
-- Link any follow-up issues created for out-of-scope suggestions
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/review-code/SKILL.md b/review-code/SKILL.md
deleted file mode 100644
index 1bb2bde..0000000
--- a/review-code/SKILL.md
+++ /dev/null
@@ -1,353 +0,0 @@
----
-name: review-code
-description: Review code changes against engineering standards. Accepts a PR number, a SHA range (BASE..HEAD), the current branch's diff against main, staged changes, or a described scope ("the last 3 commits"). Audits CLAUDE.md adherence (reads root + per-directory CLAUDE.md), intent-vs-delivery (when given a plan/ADR/ticket), security, testing (TDD evidence + three-category coverage), conventions (conventional commits + no AI attribution), root-cause discipline, architecture (layering + scope), API contracts. Produces a structured report — Strengths, per-criterion PASS/WARN/FAIL, per-issue Critical/Important/Minor severity — ending with an explicit verdict (Approve / Request Changes / Needs Discussion) plus 1-2 sentence reasoning. Self-filters low-confidence findings; never flags pre-existing issues, lint/typecheck issues (CI handles those), or changes on unmodified lines. Use before merging a PR, before pushing a branch, or when reviewing a teammate's work. Do NOT use for proposing features (use brainstorm or arch-design), drafting implementation (use start-work or add-tests), standalone security audits (use security-check), or narrow style-only checks (a linter handles those).
----
-
-# /review-code
-
-Review code changes against engineering standards. Produce a structured report with strengths, per-criterion audit, severity-tagged issues, and an explicit verdict.
-
-## Usage
-
-Point the skill at code being reviewed in any of these ways:
-
-- `/review-code [PR_NUMBER]` — fetch the PR diff via `gh pr view` + `gh pr diff`
-- `/review-code BASE_SHA..HEAD_SHA` — any git range
-- `/review-code` (no args) — the current branch's diff against its merge base with `main`
-- `/review-code --staged` — only staged changes (pre-commit scrutiny)
-- Or describe it: "review my changes in the current branch" / "review the last 3 commits"
-
-Optionally, provide intent context for delivery grading:
-
-- `plan=docs/design/<feature>.md`
-- `adr=docs/adr/<NNNN>-<title>.md`
-- `ticket=LINEAR-<id>` (fetches the ticket for comparison)
-
-When intent context is given, the review grades "does this match what was asked?" on top of code hygiene. When not, that section is marked N/A.
-
-## Execution Model
-
-For substantive reviews on large diffs: **dispatch the perspective passes as parallel sub-agents** via the Agent tool. Each sub-agent starts with a clean context window — the reviewer shouldn't inherit the implementer's mental model. For small single-commit tweaks, run inline.
-
-## Phase 0 — Eligibility Gate
-
-Before reviewing, confirm the review is worth running. Skip with a short note if any apply:
-
-- PR is closed or merged (reviewing merged code is after-the-fact observation, not a gate)
-- PR is a draft not marked ready-for-review
-- Change is an automated dep bump (dependabot, renovate) — trust the bot + CI
-- Change is trivial (whitespace-only, typo-only, revert with obvious justification)
-- A `/review-code` report already exists for this SHA range (no new commits since)
-
-If skipping, report: "Skipped — <reason>" and stop.
-
-## Phase 1 — Gather Context
-
-1. **Resolve the input to a concrete diff:**
-   - PR: `gh pr view <n> --json title,body,baseRefName,headRefName,files` + `gh pr diff <n>`
-   - SHA range: `git diff <base>..<head>` + `git log <base>..<head> --format='%h %s%n%b'`
-   - Current branch: `git merge-base HEAD main` → diff from there
-   - `--staged`: `git diff --cached`
-
-2. **Collect CLAUDE.md files to audit against:**
-   - Root project `CLAUDE.md` if present
-   - Any `CLAUDE.md` in directories whose files the diff modified
-   - Their paths go into the audit; their content guides the CLAUDE.md adherence criterion
-
-3. **If intent context was provided**, fetch and read it. Extract: stated goal, scope, non-goals, acceptance criteria, linked ADRs.
-
-4. **Scope summary** — record for the final report: file count, added/removed lines, commit count, touched modules.
-
-## Phase 2 — Multi-Perspective Pass
-
-Each perspective is a distinct review angle. For substantial changes, dispatch them as parallel sub-agents. For small changes, run sequentially inline.
-
-Follow `subagents.md` for the dispatch contract — each perspective's prompt needs explicit scope, pasted diff context, constraints (don't flag unmodified lines, don't rewrite the PR), and a required output format. Perspectives are read-only and independent, so parallel fan-out is always safe here.
-
-### Perspective A — CLAUDE.md Adherence
-
-Read the CLAUDE.md files collected in Phase 1. For each rule they state, check whether the diff honors it. CLAUDE.md is writing guidance, so not every rule applies at review time — focus on rules that are *assertions about what committed code should look like*.
-
-Example: if CLAUDE.md says "prefer `cast()` over `# type: ignore`," flag any `# type: ignore` in the diff. If it says "always run `make validate-parens` before commit," that's process guidance, not a reviewable code attribute.
-
-### Perspective B — Shallow Bug Scan
-
-Read the file changes in the diff. Ignore context beyond the changes themselves. Focus on **large, obvious bugs**:
-
-- Null / undefined dereferences on values the code can't guarantee
-- Off-by-one errors in boundary conditions
-- Incorrect error handling (swallowing exceptions, returning success on failure paths)
-- Data mutation under concurrent access without coordination
-- Incorrect SQL / query shape (wrong join, missing where clause)
-- Obvious crashes (`raise` without arguments, typed nil, etc.)
-
-Avoid small issues and nitpicks; those are for the Minor severity tier, if worth mentioning at all.
-
-### Perspective C — Git History Context
-
-Run `git blame` on the modified lines. Read recent commits that touched the same files. Check:
-
-- Is this change contradicting a recent deliberate choice? (Someone just fixed X; now this PR re-introduces it.)
-- Is there a pattern of the same bug being fixed repeatedly here?
-- Is the surrounding code style / convention consistent with what's being added?
-
-### Perspective D — Prior PR Comments
-
-`gh pr list` the PRs that previously touched these files. Check their review comments. If prior reviewers flagged a pattern, check whether the current diff repeats it.
-
-### Perspective E — Code Comments In Scope
-
-Read comments in the modified files (both touched and nearby). If the code has guidance ("DO NOT call this before X is initialized"), check the diff complies.
-
-## Phase 3 — Criteria Audit
-
-For each criterion below, report **PASS / WARN / FAIL** with file:line references. WARN and FAIL findings become issues in the Phase 4 summary.
-
-### Intent vs Delivery (when intent context provided)
-
-- Does the implementation match the plan / ADR / ticket?
-- Are acceptance criteria demonstrably met?
-- Scope creep: changes not in the plan?
-- Missing requirements: plan items unimplemented?
-
-Skip if no intent context; note "not evaluated" in the report.
-
-### Security
-
-- No hardcoded secrets, tokens, API keys, or credentials
-- All user input validated at system boundary
-- Parameterized queries only (no SQL string concatenation)
-- No sensitive data logged (PII, tokens, passwords)
-- Dependencies pinned and auditable
-
-### Testing (TDD Evidence)
-
-- Tests exist for all new code
-- Test commits **precede** implementation commits (TDD workflow)
-- Three categories covered: Normal, Boundary, Error per function; thorough on edge cases
-- Tests are independent — no shared mutable state
-- Mocking at external boundaries only (network, file I/O, time) — domain logic tested directly
-- Test naming follows project convention
-- Coverage does not decrease without justification
-- Parameter-heavy functions: author considered pairwise coverage via `/pairwise-tests`? (Not required; worth noting if missing.)
-
-### Conventions
-
-- Type annotations on all functions (including return types)
-- Conventional commit messages (`feat:`, `fix:`, `chore:`, etc.)
-- **No AI attribution anywhere** — code, comments, commits, PR descriptions — see `commits.md`
-- One logical change per commit
-- Docstrings on public functions/classes
-
-### Root Cause & Thoroughness
-
-- Bug fixes address the root cause, not surface symptoms
-- Changes demonstrate understanding of surrounding code
-- Edge cases covered comprehensively, not just the happy path
-
-### Architecture
-
-- Request handlers thin; business logic in services/domain
-- No unnecessary abstractions or over-engineering
-- Changes scoped to what was asked (no drive-by refactoring)
-- Stated architecture respected — if `.architecture/brief.md` exists, check conformance (see `arch-evaluate`)
-
-### API Contracts
-
-- New endpoints have typed contracts or schemas defined
-- No raw dict/object responses bypassing the contract layer
-- Client-side types match server-side output
-- Data flows through the API layer, not direct data access from handlers
-
-## Phase 4 — Filter and Categorize
-
-### Confidence Filter
-
-For each issue surfaced by any perspective or criterion, self-scrutinize: **am I really sure this is a real issue, or could I be wrong?** Rate your confidence honestly:
-
-- **High** — verified by reading the code; matches a pattern that causes bugs in practice; or a direct CLAUDE.md violation you can cite
-- **Medium** — likely real but contingent on context you didn't fully verify
-- **Low** — looks off but you can't confirm; might be a false positive
-
-**Drop Low-confidence issues before the final report.** Medium and High issues appear; Medium ones noted as such where uncertainty is relevant.
-
-### False-Positive Filter
-
-Do **not** flag any of these as issues:
-
-- **Pre-existing issues** on unmodified lines — note separately as "for follow-up," don't block this PR
-- **Lint / typecheck / test failures** — CI handles those; don't run builds yourself
-- **Nitpicks a senior engineer wouldn't call out** — unless CLAUDE.md explicitly requires them
-- **Style issues** — formatters and linters handle these
-- **Issues explicitly silenced in code** (e.g., `# type: ignore[...]` with a reason, lint ignore comments) unless the silencing is unjustified
-- **Intentional changes** in functionality clearly related to the PR's stated goal
-- **Changes in unmodified lines** (real issues in files the PR touches but on lines it doesn't change)
-- **Framework behavior being tested** — see `testing.md` anti-patterns
-
-### Severity Categorization
-
-Remaining issues get tagged:
-
-- **Critical** — merge-blockers: security holes, broken functionality, data loss risk, missing acceptance criteria, AI attribution in committed content
-- **Important** — should fix: architecture problems, test gaps, error handling holes, pattern violations
-- **Minor** — nice to have: missing docstrings, docstring drift, small optimizations
-
-## Phase 5 — Output
-
-```markdown
-# Code Review — <PR title / branch name / SHA range>
-
-**Scope:** <N files, +X / -Y lines, Z commits>
-**Input:** <pr#N / base..head / current branch>
-**Intent context:** <plan/ADR/ticket link, or "none provided">
-**CLAUDE.md files audited:** <list of paths>
-
-## Strengths
-
-Three minimum. Specific, with file:line.
-
-- <Specific positive>
-- <Specific positive>
-- <Specific positive>
-
-## Per-Criterion Audit
-
-| Criterion | Status | Notes |
-|---|---|---|
-| CLAUDE.md Adherence | PASS / WARN / FAIL | ... |
-| Intent vs Delivery | PASS / WARN / FAIL / N/A | ... |
-| Security | PASS / WARN / FAIL | ... |
-| Testing (TDD Evidence) | PASS / WARN / FAIL | ... |
-| Conventions | PASS / WARN / FAIL | ... |
-| Root Cause & Thoroughness | PASS / WARN / FAIL | ... |
-| Architecture | PASS / WARN / FAIL | ... |
-| API Contracts | PASS / WARN / FAIL | ... |
-
-## Issues
-
-### Critical (must fix before merge)
-
-1. **<Title>**
-   - File: `<path>:<line>`
-   - Problem: <what's wrong>
-   - Why it matters: <impact>
-   - Fix: <concrete suggestion>
-
-### Important (should fix)
-
-1. **<Title>** — `<file>:<line>` — <one-sentence description>
-
-### Minor (nice to have)
-
-1. **<Title>** — `<file>:<line>`
-
-## Recommendations
-
-Meta-level suggestions: process changes, follow-up tickets, architectural drift observations.
-
-## Verdict
-
-**<Approve / Request Changes / Needs Discussion>**
-
-**Reasoning:** <1-2 sentences. Grounded in the audit.>
-```
-
-## Critical Rules
-
-**DO:**
-- Categorize issues by actual severity — not everything is Critical
-- Cite specifics — `file:line`, not vague prose
-- Explain why each issue matters, not just what it is
-- Acknowledge strengths — mandatory; three minimum
-- Give a clear verdict with reasoning
-- Trust CI for lint, typecheck, test runs; don't re-run them
-- Self-filter low-confidence findings before reporting
-
-**DON'T:**
-- Say "looks good" without citing what was checked
-- Mark style nitpicks as Critical
-- Give feedback on code you didn't actually read
-- Flag pre-existing issues as PR-blockers
-- Use emojis or marketing-adjacent language
-- Skip the verdict or hedge it
-- Add any AI attribution to the output
-
-## Example Output
-
-```markdown
-# Code Review — feat: add inventory export CSV (#447)
-
-**Scope:** 8 files, +312 / -24 lines, 5 commits
-**Input:** PR #447
-**Intent context:** docs/design/inventory-export.md
-**CLAUDE.md files audited:** CLAUDE.md, api/CLAUDE.md
-
-## Strengths
-
-- Characterization tests added before refactor (`tests/test_inventory_export.py:12-89`) — TDD evidence clear across commit order
-- Export batches via generators rather than loading full inventory (`inventory/export.py:42-78`) — handles the 50k-row case without OOM
-- API schema versioned from day one (`api/schemas/export.py:5`) — cleaner than most first-endpoints
-
-## Per-Criterion Audit
-
-| Criterion | Status | Notes |
-|---|---|---|
-| CLAUDE.md Adherence | PASS | No `# type: ignore` without justification; conventional commits clean |
-| Intent vs Delivery | PASS | Matches plan §3.2; acceptance criteria met |
-| Security | WARN | User-provided filename not sanitized (see Important #1) |
-| Testing (TDD Evidence) | PASS | 5 test commits precede 3 implementation commits |
-| Conventions | PASS | All conventional, no AI attribution detected |
-| Root Cause & Thoroughness | PASS | Empty inventory, Unicode, large batches all covered |
-| Architecture | PASS | Handler thin; logic in inventory service |
-| API Contracts | PASS | Schema defined; TS types match in client/ |
-
-## Issues
-
-### Critical
-
-None.
-
-### Important
-
-1. **Filename injection via user input**
-   - File: `api/views/inventory.py:42`
-   - Problem: User-provided `filename` passed directly to `Content-Disposition` header
-   - Why it matters: CRLF injection → header smuggling; also sets up a path-traversal risk if ever used for disk writes
-   - Fix: `secure_filename(user_filename)` (werkzeug) or regex-strip to `[A-Za-z0-9._-]+`
-
-### Minor
-
-1. **Missing type on private helper** — `inventory/export.py:22` — `_chunked()` return type unspecified
-2. **Docstring drift** — `inventory/service.py:104` — docstring describes pre-batch behavior
-
-## Recommendations
-
-- Consider `/pairwise-tests` on the `export_options` function (4 parameters × 2-3 values each); currently 3 hand-written tests cover a fraction of the combinatorial space
-
-## Verdict
-
-**Request Changes**
-
-**Reasoning:** Core implementation is strong — TDD evidence, thin handler, proper schema, good edge coverage. The filename-injection issue is a must-fix before merge; once resolved this is a clean approve.
-```
-
-## Anti-Patterns
-
-- **All-caps everything.** If everything is Critical, nothing is. Be honest about severity.
-- **No Strengths section.** Reviews that only list problems are inaccurate — you missed what's right. Minimum three.
-- **Verdict without reasoning.** "Request Changes" alone is unhelpful. One or two sentences on why.
-- **Criteria skipped silently.** If you didn't check API contracts, say "N/A — no API changes in this diff," not silence.
-- **Reviewing code you didn't read.** Don't feedback on files you skimmed.
-- **Running the build yourself.** CI does that. Don't re-verify what CI is for.
-- **Hedging ("might be an issue").** If you can't commit to it, it's Low-confidence — drop it.
-
-## Hand-Off
-
-- **Critical** → must be addressed before merge; author fixes, re-review via `/review-code` on the updated SHA
-- **Important** → fix, or deliberately defer with an ADR (run `/arch-decide`)
-- **Minor** → follow-up issues or a cleanup PR
-- **Intent-vs-Delivery gaps** → either file tickets for the missing pieces or update the plan to reflect reality
-
-## Content scope
-
-Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.
diff --git a/security-check/SKILL.md b/security-check/SKILL.md
deleted file mode 100644
index ca431e0..0000000
--- a/security-check/SKILL.md
+++ /dev/null
@@ -1,48 +0,0 @@
-# /security-check — Audit Changes for Security Issues
-
-Scan staged or recent changes for secrets, OWASP vulnerabilities, and dependency risks.
-
-## Usage
-
-```
-/security-check [FILE_OR_DIRECTORY]
-```
-
-If no argument is given, audit all staged changes (`git diff --cached`). If there are no staged changes, audit the diff from the last commit.
-
-## Instructions
-
-1. **Gather the changes** to audit:
-   - Staged changes: `git diff --cached`
-   - Or last commit: `git diff HEAD~1`
-   - Or specific path if provided
-
-2. **Check for hardcoded secrets** — scan for patterns:
-   - AWS access keys (`AKIA...`)
-   - Generic secret patterns (`sk-`, `sk_live_`, `sk_test_`)
-   - Password assignments (`password=`, `passwd=`, `secret=`)
-   - Private keys (`-----BEGIN.*PRIVATE KEY-----`)
-   - `.env` file contents committed by mistake
-   - API tokens, JWTs, or bearer tokens in source code
-
-3. **OWASP Top 10 review**:
-   - SQL injection: string concatenation in queries
-   - XSS: unsanitized user input rendered in HTML/JSX
-   - Broken authentication: missing permission checks on endpoints
-   - Insecure deserialization: unsafe deserialization of untrusted data (e.g., eval, exec)
-   - Security misconfiguration: debug mode enabled in production settings
-   - Sensitive data exposure: PII or tokens in log statements
-
-4. **Dependency audit**:
-   - Run `pip-audit` if Python files changed
-   - Run `npm audit` if JavaScript/TypeScript files changed
-   - Flag any new dependencies added without version pinning
-
-5. **Report findings** in a table:
-
-   | Severity | File:Line | Finding | Recommendation |
-   |----------|-----------|---------|----------------|
-
-   Severity levels: CRITICAL, HIGH, MEDIUM, LOW, INFO
-
-6. If no issues found, report "No security issues detected" with a summary of what was checked.
diff --git a/start-work/SKILL.md b/start-work/SKILL.md
deleted file mode 100644
index 311476d..0000000
--- a/start-work/SKILL.md
+++ /dev/null
@@ -1,269 +0,0 @@
----
-name: start-work
-description: Pick up a task (Linear ticket, GitHub issue, todo.org task, or a described scope) and take it through Claim, Justify, Approach, Implement, Verify, and Hand-off. Three user-approval gates separate the phases. The Justify gate covers benefits, costs, engineer/user impact, urgency, effort, alternatives, and a ticket-quality check. The Approach gate covers root cause, risk, refactor prerequisites, test strategy (unit, integration, e2e, pairwise, characterization), migration and backwards-compat, feature-flag question, commit decomposition, and branch name. Implementation uses TDD (red, green, edge cases, refactor audit of every touched file). The audit walks the whole of each touched file against a language-agnostic checklist; every finding is either fixed on this branch or filed as a ticket — nothing is silently dropped. A verify phase exercises the feature end-to-end in the local environment (Playwright against localhost for web projects, scripted manual test otherwise) before the final gate confirms readiness and hands off to the Review-and-Publish flow in commits.md. Use when starting work on a specific task where both "should we" and "how exactly" are worth deliberating. Do NOT use for open-ended bug investigation without a clear target (use debug first), for architectural paradigm exploration (use arch-design), for architectural decision recording (use arch-decide), when the task is trivial and obvious (just do it), or when requirements are still being shaped (use brainstorm).
----
-
-# /start-work: pick up a task, justify it, plan it, build it
-
-Three review gates separate the phases. The user can redirect or kill the work at each one.
-
-1. **Claim.** Mark in-progress, assign, label, verify project.
-2. **Justify (gate 1).** Benefits, costs, impact, urgency, effort, alternatives, ticket quality. Stop for approval.
-3. **Approach (gate 2).** Root cause, risk, tests, migration, flag, commit decomposition. Stop for approval.
-4. **Implement.** TDD red, green, edge cases, refactor audit of every touched file.
-5. **Verify.** End-to-end or scripted manual test in the local environment.
-6. **Ready to commit (gate 3).** Report, stop for approval.
-7. **Hand off** to the Review-and-Publish flow in `commits.md`.
-
-## Usage
-
-```
-/start-work <task-ref>
-```
-
-`<task-ref>` can be:
-
-- A Linear ticket ID or URL: `SE-170`, `https://linear.app/deepsat/issue/SE-170`
-- A GitHub issue URL or number
-- A todo.org heading reference or description: `todo.org:mission-sync refactor`
-- A free-form scope description: "update the mission-card fallback"
-
-If the reference is ambiguous, ask the user to clarify before proceeding.
-
-## Phase 0: eligibility
-
-Skip with a short note and stop if any apply:
-
-- Task is already Done, closed, or merged.
-- Task is assigned to someone else and the user has not asked to take it over.
-- Task is an obvious duplicate of something in-progress.
-- Task description is so vague that even the Justify gate cannot engage. Route to `/brainstorm`.
-
-## Phase 1: claim
-
-Make ownership explicit before any other work starts. The exact steps depend on where the task lives.
-
-### Linear ticket
-
-1. Fetch the ticket with the Linear MCP tools.
-2. Move status to **In Progress**.
-3. Assign the user. If another assignee is already present, add the user as a second assignee. If the Linear API does not accept multiple assignees, post a comment ("Picking this up alongside <existing assignee>") and proceed.
-4. Verify the ticket has exactly one of these labels: **Bug**, **Test**, **Chore**, or **Feature**. If missing or wrong, ask the user which applies and set it.
-5. Verify the ticket's project. If unset or wrong, ask the user which project it belongs to.
-
-### GitHub issue
-
-1. Fetch with `gh issue view <n> --json title,body,state,assignees,labels`.
-2. Assign to the user: `gh issue edit <n> --add-assignee @me`.
-3. Verify the Bug / Test / Chore / Feature label. Add if missing.
-4. Post a comment noting you are starting work.
-
-### Todo.org task
-
-1. Locate the heading the user referenced.
-2. Change the TODO keyword to `DOING`.
-3. Add exactly one tag: `:bug:`, `:feature:`, `:test:`, or `:chore:`. Ask the user which applies if none is obvious. Todo.org is personal, so there is no assignee step.
-
-### Unticketed
-
-1. Note in the session that the work is unticketed.
-2. Ask the user whether to create a ticket or issue retroactively before continuing. If no, proceed but flag in the final commit message that there is no linked ticket.
-
-## Phase 2: justify (gate 1)
-
-Read the task description end to end. Skim the code it references.
-
-Then produce a justification that covers all of these, concisely:
-
-1. **Benefits.** What is better after this lands? Concrete, not abstract.
-2. **Costs.** Time, risk, reviewer bandwidth, ceremony overhead.
-3. **Engineer impact.** Does it make someone's life easier? Catch a class of bug? Remove friction?
-4. **End-user impact.** Behavioral change? Visible? Invisible-but-protective?
-5. **Downsides.** What do we lose? Where would we regret doing this?
-6. **Urgency and priority fit.** Does this align with current goals or an upcoming deadline? If the project has committed deadlines, explicitly check this against them. Anything not obviously on the critical path should be called out as "deferrable."
-7. **Effort estimate.** S (under 1 hour), M (1 hour to 1 day), L (over 1 day). Rough is fine.
-8. **Alternatives considered.** Is there a cheaper way? Can we defer? Can we address the root cause via a different path?
-9. **Ticket quality check.** Is scope clear, are acceptance criteria concrete, are reproduction steps present for bugs? If **not clear**, stop and ask the user to choose one of:
-   - (a) Bounce to `/brainstorm` to refine the ticket first.
-   - (b) Ping the ticket author for clarification.
-   - (c) Supply the missing info themselves right now, if it is easy for them to do so.
-
-### Gate
-
-Present the justification to the user. Stop. Wait for questions and explicit approval ("approved", "proceed", or equivalent) before starting Phase 3.
-
-Do not generate the approach while waiting. The user may kill the task at this gate, and any pre-generated approach would be wasted work.
-
-If the user kills the task, roll back the Phase 1 claim: move the ticket back to its prior status, remove the assignment you added, and remove the label you added (if any).
-
-## Phase 3: approach (gate 2)
-
-Read the referenced code end to end. Understand the surrounding context: callers, callees, existing tests, adjacent modules.
-
-Then produce an approach that covers:
-
-1. **Root cause.** For bugs, where the bug originates, not just where it surfaces. For features, which layer owns the new behavior.
-2. **Code that changes.** Files and functions, with a rough line-count estimate.
-3. **Risk.** Who and what does this affect? Local (one file) or does it ripple? Flag anything that touches shared state, public APIs, or core data flow.
-4. **Refactor prerequisites.** Does the codebase need restructuring before this fix is easy? If yes, that is a separate ticket and should be done first.
-5. **Characterization tests.** If modifying existing untested code, write characterization tests first to lock behavior before changing it (see `testing.md`).
-6. **Test strategy decomposition.** Which of these are needed, and roughly how many of each:
-   - Unit tests.
-   - Integration tests.
-   - E2E tests.
-   - Pairwise or combinatorial tests, if parameter-heavy (see `/pairwise-tests`).
-7. **Migration and backwards-compat surface.** DB migration? API contract change? Frontend consumer impact? Config shape change? Flag if yes and describe the scope.
-8. **Feature flag.** Does this ship behind a flag or direct? Always worth asking once.
-9. **Commit decomposition.** One commit, or N commits? Each commit should be one logical change per `commits.md`. Size the Review-and-Publish ceremony ahead of time.
-10. **Branch name.** Following the project convention: `fix/<ID>-slug`, `feature/<ID>-slug`, `chore/<ID>-slug`, or `test/<ID>-slug`. Unticketed work uses a short descriptive slug.
-
-### Gate
-
-Present the approach to the user. Stop. Wait for questions and explicit approval before starting Phase 4.
-
-If the user redirects the approach, update the plan and re-present rather than silently adjusting during implementation.
-
-## Phase 4: implement (TDD)
-
-Follow the red-green-refactor cycle from `testing.md`.
-
-1. **Create the branch** using the name decided in Phase 3.
-2. **Red.** Write a failing test that demonstrates the bug or captures the new desired behavior. Run it. Confirm it fails for the right reason, not because the test itself is broken. Commit as `test: <desc>`.
-3. **Green.** Write the minimal code to make the test pass. Do not generalize yet. Do not add features the test does not require. Commit as `fix:` or `feat:`.
-4. **Edge cases.** Add tests in all three categories per `testing.md`:
-   - Normal: happy path, typical input.
-   - Boundary: empty inputs, nulls, minimum and maximum values, single-element collections, Unicode, long strings, time and timezone boundaries, concurrent access.
-   - Error: invalid inputs, missing required parameters, permission denied, resource exhaustion, malformed data, network failures.
-   Commit as `test: add edge cases for <desc>`.
-5. **Refactor audit.** After tests are green, audit every file you touched in this task — not just the code you wrote, but the whole file, top to bottom. The question is no longer "is my new code clean?" but "what refactoring opportunities exist in this file, and which belong on this branch versus a follow-up ticket?"
-
-   Work the touched-file list explicitly. For each file, walk the checklist below (a through h), note every candidate, and decide its disposition. "Touching" a file means any modification, however small — a one-line edit still qualifies the whole file for audit. Keep tests green throughout. If they go red during a change, you have altered behavior, not just form — stop and decide whether the change is intentional before proceeding.
-
-   The checklist is language-agnostic. The same smells appear in Python, TypeScript, Go, Elisp, Rust, shell, SQL, and anything else.
-
-   a. **Stale documentation.** Comments, docstrings, file headers, module-level summaries, READMEs, ADRs, architectural diagrams, or any prose that now contradicts the code. Update or delete. Prefer deletion when the documentation duplicates what the code, the tooling, or the runtime config already communicates — duplicated information is rotted documentation waiting to happen. The test: if a future reader would learn nothing from the doc that the code does not already say, drop it.
-
-   b. **Duplication.** Three distinct kinds:
-      - *Logic duplication*: the same computation or control flow appearing in multiple places. Extract when it appears three or more times, or when the duplication crosses an abstraction boundary, or when a future divergence between the copies would be a real bug. Two occurrences of a simple expression usually does not justify extraction. Three similar lines beats a premature abstraction.
-      - *Literal duplication*: repeated strings, regexes, magic numbers, paths, URLs, error codes, keywords — any value that would need to change together. A shared constant is cheap insurance and makes the intent explicit.
-      - *Intra-function expression duplication*: the same non-trivial expression evaluated twice inside one scope. Bind it to a local name once. Shorter function and no risk of the two expressions drifting apart when someone edits one.
-
-   c. **Naming drift.** Names that describe what the identifier used to do, not what it does now. Names that mix abstraction levels (a high-level operation named after its implementation detail). Inconsistency across the module: `get_foo` next to `fetch_bar` next to `load_baz` for operations that are semantically the same. Pick one verb per concept and rename. Renaming is cheap in any language with a competent tool, and clarity compounds.
-
-   d. **Scope and cohesion.** Functions doing two things — a name with "and" or two clauses joined by commas is the tell. Split. Related functions scattered across the file. Cluster them with a comment header or section break. Unrelated functions grouped only by a superficial property (all private, all on the same keybinding, all using the same framework feature). Group by purpose, not accident. Code reads like a book. Related concepts should be neighbors.
-
-   e. **Premature abstraction.** Helpers with one caller that do not document intent better than the inline version — inline them. Parameters always passed the same value by every caller — drop them. Configuration knobs that no caller varies — delete them. Interfaces with a single implementation and no realistic second — collapse them. Abstractions built "for future flexibility" that have not been exercised are carrying cost with no benefit. Speculative generality is a tax you pay on every read.
-
-   f. **Dead code.** Unused imports, uncalled functions, variables never referenced, parameters never consumed inside the body, types no one uses. Commented-out blocks kept "in case we need it later." You will not need them, and if you do, the version control history has them. Delete.
-
-   g. **Error handling parity.** Similar operations emitting different error shapes (exceptions vs. return values vs. log-and-continue vs. silent swallow). Error messages that expose internal state unhelpfully, or that strip the context a caller needs to act. Guards present in some parallel paths but missing in others. Parity beats novelty — if three siblings behave the same way, the fourth should too, or have a documented reason not to.
-
-   h. **Test smells.** Tests are code and rot the same way. Copy-pasted fixtures that should parametrize. Assertions that lock to implementation (exact strings, internal structure, field order) rather than behavior. Dead mocks that stub something the test no longer exercises. Mocks of internal helpers rather than external boundaries. See `testing.md` "Signs of overmocking."
-
-   i. **Out-of-file scope.** The audit stops at the touched-file boundary. If you happen to notice a smell in a file you did not touch, do not expand the audit into it — file a ticket so the finding is not lost. Drive-by audits across the codebase balloon review time and break the working set. Exception: a rename or structural change that would leave the codebase inconsistent if shipped half-done is in scope and required.
-
-   **Disposition for each candidate.** Every candidate must land in one of three buckets. There is no silent drop.
-
-   - *Fix now, fold into the related feature or fix commit*: small, directly related to the task's work on this file, obviously clearer, no new risk surface.
-   - *Fix now, separate `refactor:` commit on this branch*: related to the surface you touched but larger in scope, or reshaping something non-trivial. Separating it keeps the feature commit focused for review.
-   - *File a ticket or todo.org entry*: the smell is real but unrelated to this task, lives in a touched file outside the task's working set, or was noticed in an untouched file. Filing — not skipping — is the default for anything that does not fit the two "fix now" cases. Capture enough detail that a future session can act on it: file path, line or function, smell category (one of a through h), and a one-line description.
-
-   If a candidate feels too small to fix and too small to file, it was either not a real smell, or you are talking yourself out of a two-line todo entry. Write the entry.
-
-   **Stop conditions.** The *audit* is complete when every touched file has been walked and every candidate has a disposition. The *fixing* stops earlier: ask "would a reasonable reviewer flag this?" of the remaining in-scope candidates. If the answer is no, stop fixing and file the rest. Shipping beats polishing, but filing beats forgetting.
-
-   Commit: group meaningful refactors into a `refactor: <desc>` commit when they stand on their own. Fold small tweaks into the associated feature or fix commit when they are tied to the same scope. The commit history should let a future reader see intent per commit, not a mixture of "did the thing" and "also cleaned up five unrelated corners."
-
-### Constraints
-
-- **Root cause, not symptom.** If the task is a bug, fix where the bug originates, not where it surfaces.
-- **No drive-by refactoring.** Only change code the task requires. Unrelated cleanups go in a separate ticket.
-- **No hypothetical-future code.** Solve the current problem. Do not design for requirements that have not been asked.
-- **Framework and library code is trusted.** Mock at boundaries (network, time, file I/O), not at internal helpers (see `testing.md` "Signs of overmocking").
-
-## Phase 5: verify end-to-end
-
-Unit tests prove the internals are green. They cannot prove the feature works for the user. Before the ready-to-commit gate, exercise the feature end-to-end on your local machine — a running dev server on localhost for web work, the actual editor or CLI for everything else. Never production. Production verification is a separate concern that belongs to release procedures, not to a pre-merge workflow. Skipping this phase is how "all tests green" becomes "shipped broken" — it caught a one-second browser-open timeout in local testing that no unit test had any way to see.
-
-Pick the verification mode that matches the project's stack.
-
-### If the project has browser-automatable UI
-
-Web apps, dashboards, SPAs, admin tools, any feature reachable through a browser. Write a Playwright end-to-end test that exercises:
-
-- The happy path the feature was built for, clicking through as a user would.
-- Any boundary or error cases that unit tests could not reach: authentication, cross-page navigation, state across reloads, deep-link URLs, permission-denied flows.
-- The user-observable failure mode of any known upstream dependency, mocked or stubbed where needed.
-
-The E2E test lives in the repo alongside the feature and runs in CI like any other test. Delegate the test authoring to `/playwright-js` for JavaScript or TypeScript stacks, `/playwright-py` for Python stacks. Do not write Playwright code from scratch when those skills are available.
-
-### If the project has no browser UI
-
-CLI tools, libraries, Emacs or editor configuration, shell scripts, daemons, anything where there is no DOM to automate. Lead the user through a scripted manual test. Provide:
-
-1. **An explicit sequence of steps.** Specific commands to run, specific keys to press, specific files to open. Not "try the feature" but "open file X, press C-; h d, pick draft Y."
-2. **The expected observable outcome at each step.** What message should appear in the echo area, what buffer should show, what file should change on disk, what exit code the process should return, what the browser should display. One expected outcome per step so failures pinpoint where.
-3. **Failure signals.** What broken looks like. "If you see nothing in the echo area, the binding did not fire. If you see `No #+hugo_draft keyword`, the buffer has no Hugo front matter." Pattern-matching against known failure modes shortens diagnosis.
-
-Wait for the user to walk through the steps and report back. Do not skip ahead. Do not assume success without the user's confirmation. If the user reports a failure, route the failure back through Phase 3 (if the approach was wrong) or Phase 4 (if the implementation was wrong), then re-verify.
-
-### In both modes
-
-- **Run against a clean environment.** Restart the process, clear the cache, open a fresh browser session, re-evaluate the loaded module. Stale state masks real bugs — today's "toggling the draft doesn't work" turned out to be stale code in a running Emacs.
-- **Verify failure paths, not just the happy path.** A feature that works when nothing goes wrong is half-tested. Force an error path if the feature has one.
-- **If verification reveals a unit-test gap, add the missing unit test before gate 3.** A bug you hit manually is a bug worth locking in with a test so it cannot regress.
-- **Keep the verification artifact.** For browser work, the Playwright test stays in the repo. For manual scripts, paste the steps into the Phase 6 handoff report so a reviewer can re-verify on request.
-
-### Stop condition
-
-Every verified scenario produces its expected observable outcome. Any failure is routed back to Phase 3 or Phase 4 — not papered over, not marked as "known issue" without filing a follow-up ticket.
-
-## Phase 6: ready to commit (gate 3)
-
-Before handing off to the Review-and-Publish flow, stop and report:
-
-- What was done. Files changed, tests added, test-suite result.
-- What was verified in Phase 5, and how. For manual scripts, paste the step list so a reviewer can re-run the verification. For Playwright tests, name the test file.
-- Any deviations from the Phase 3 approach that happened during implementation, and why.
-- Follow-up tickets filed during the refactor audit, listed by ID so the reviewer can see what was deferred and why. "Surfaced" is not enough — these are actually filed before gate 3 clears, not left as a mental note.
-
-Wait for explicit approval before starting the commit and PR ceremony.
-
-If deviations are significant, the user may want to loop back and revise the approach before publishing.
-
-## Phase 7: hand off to Review-and-Publish
-
-Follow `commits.md` exactly. Summary of the flow:
-
-1. Run `/review-code --staged` before each commit, or `/review-code` on the whole branch before the PR. Block on Critical or Important findings.
-2. Draft the commit message to `/tmp/commit-<slug>.md`. Run `humanizer`. Apply the plain-English pass. Replace semicolons. Stop for approval.
-3. After approval, commit.
-4. Draft the PR body to `/tmp/pr-<ticket-or-slug>.md`. Body must include a `Linear:` or equivalent cross-link line. Run `humanizer`. Apply the plain-English pass. Replace semicolons. Stop for approval.
-5. After approval, push and run `gh pr create`.
-6. Post the PR URL back to the Linear ticket, GitHub issue, or todo.org entry.
-7. Move the Linear or GitHub status to **Dev Review**. Todo.org has no equivalent. Leave the todo.org entry as `DOING` until the PR merges.
-
-## Anti-patterns
-
-- **Skipping the Justify gate.** "This is obviously worth doing" is exactly what the gate exists to verify. If the answer really is obvious, the gate takes thirty seconds.
-- **Skipping the Approach gate.** Implementation without a plan is how scope creep happens. It is also how the user loses the chance to redirect.
-- **Marking a task In Progress before Phase 2 approval.** If the Justify gate kills the task, the Claim should roll back cleanly.
-- **Blurring the gates.** Write the justification, stop, wait. Do not pre-generate the approach while waiting. The user may kill the task and the pre-work gets wasted.
-- **Treating Feature tasks as skippable on the Approach gate.** Features especially need the migration, backwards-compat, and feature-flag questions answered up front.
-- **Letting the TDD cycle drift.** If the test passes before the implementation is written, the test is wrong. Confirm the red before moving to green.
-- **Skipping the refactor audit.** A green test suite is necessary, not sufficient. Walking the touched-file list against the refactor checklist catches the stale comment, the naming drift, and the duplicated expression that a reviewer will otherwise flag. Leave the code better than you found it, within scope — and file what you cannot fix on this branch.
-- **Auditing only the code you wrote.** "I only changed one line, the rest of the file isn't my problem" — it is, to the extent that you file what you see. The audit is per touched file, not per diff hunk. Anything noticed in a touched file lands somewhere: this branch or a ticket.
-- **Skipping the verify phase.** Green unit tests do not mean the feature works for the user. A one-second delay that looks fine on a mocked process is a broken experience on a real Hugo build. Five minutes of scripted manual testing or a Playwright run catches the gap before a reviewer does.
-
-## Cross-references
-
-- `commits.md`: the Review-and-Publish flow used in Phase 7.
-- `testing.md`: TDD discipline, edge case categories, characterization tests, overmocking signals.
-- `subagents.md`: dispatch contract for parallel code research during Phase 3 if the code surface is large.
-- `/review-code`: runs inside Phase 7.
-- `/brainstorm`: route here from the Phase 2 ticket-quality branch.
-- `/arch-design`: route here if Phase 3 reveals an architectural question the task cannot answer on its own.
-- `/arch-decide`: route here if Phase 3 surfaces a decision worth recording as an ADR.
-- `/debug`: route here if Phase 2 reveals the task needs investigation before it can be justified.
-- `/pairwise-tests`: route here from Phase 3 if the test matrix warrants combinatorial coverage.
-- `/playwright-js`, `/playwright-py`: route here from Phase 5 to author E2E tests for web projects.
-- 
cgit v1.2.3