1 files changed, 245 insertions, 0 deletions
diff --git a/arch-design/SKILL.md b/arch-design/SKILL.md
new file mode 100644
index 0000000..d1381e3
--- /dev/null
+++ b/arch-design/SKILL.md
@@ -0,0 +1,245 @@
+---
+name: arch-design
+description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the arch-* suite (arch-design / arch-decide / arch-document / arch-evaluate).
+---
+
+# Architecture Design
+
+Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision.
+
+## When to Use This Skill
+
+- Starting a new project, service, or major subsystem
+- Restructuring or re-platforming an existing system
+- Selecting a primary tech stack for a green-field effort
+- Establishing a formal architecture before a team scales
+- Preparing a spike or prototype you want to keep coherent
+
+## When NOT to Use This Skill
+
+- Bug fixing or defect investigation
+- Small feature additions inside an existing architecture
+- Recording a single decision (use `arch-decide`)
+- Producing formal documentation from a known architecture (use `arch-document`)
+- Auditing an existing codebase against its stated architecture (use `arch-evaluate`)
+
+## Workflow
+
+The skill runs in five phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake.
+
+1. **Intake** — elicit stakeholders, domain, scale, timeline, team
+2. **Quality attributes** — prioritize (can't have everything)
+3. **Constraints** — technical, organizational, legal, cost
+4. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis
+5. **Recommendation** — pick one, justify it, flag open questions for ADRs
+
+## Phase 1 — Intake
+
+Ask the user to answer each. Short answers are fine; vague answers mean return to the question.
+
+**System identity**
+- What is this system? (One sentence.)
+- Who uses it? Human end users, other services, both?
+- What domain is it in? (commerce, health, comms, finance, ops, etc.)
+
+**Scale**
+- Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes)
+- Data volume at launch and in 12 months? (rows/docs, GB)
+- Geographic distribution? (single region, multi-region, edge)
+
+**Team**
+- Team size now? Growing?
+- Existing language/stack expertise?
+- Operational maturity? (on-call rotation, observability tooling, CI/CD)
+
+**Timeline**
+- When does it need to be in production?
+- Is this replacing something? What's the migration path?
+
+**Change drivers**
+- What forces are likely to reshape this system? (new markets, regulatory, volume, organizational)
+
+## Phase 2 — Quality Attributes
+
+List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities.
+
+Standard list (use this set; add domain-specific ones if relevant):
+
+- **Performance** (latency, throughput under load)
+- **Scalability** (horizontal, vertical, elastic)
+- **Availability** (uptime target, failure tolerance)
+- **Reliability** (data durability, correctness under partial failure)
+- **Security** (authn/z, data protection, threat model)
+- **Maintainability** (readability, testability, onboarding speed)
+- **Observability** (logs, metrics, tracing, debuggability)
+- **Deployability** (release cadence, rollback speed)
+- **Cost** (infra, operational, total cost of ownership)
+- **Compliance** (regulatory, audit, data residency)
+- **Interoperability** (integration with existing systems, APIs, standards)
+- **Flexibility / evolvability** (ease of adding features, changing direction)
+
+Document the ranking verbatim in the brief. Future decisions hinge on it.
+
+## Phase 3 — Constraints
+
+Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide.
+
+- **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move
+- **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan
+- **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention
+- **Cost**: budget ceiling, licensing limits, infra cost targets
+- **Timeline**: hard dates from business, regulatory deadlines
+- **Compatibility**: backward-compatibility with existing clients, API contracts, data formats
+
+## Phase 4 — Candidate Paradigms
+
+Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed.
+
+Common paradigms to consider:
+
+| Paradigm | Fits when… | Doesn't fit when… |
+|---|---|---|
+| **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements |
+| **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity |
+| **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate |
+| **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit |
+| **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity |
+| **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified |
+| **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive |
+| **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control |
+| **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency |
+| **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale |
+| **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system |
+
+For each candidate, document:
+
+- **Summary** — one paragraph on what the architecture would look like
+- **Why it fits** — map to the ranked quality attributes
+- **Why it might not** — the honest cons for this specific context
+- **Cost** — team learning curve, operational overhead, infra impact
+- **Open questions** — what would need to be answered or decided via ADR
+
+## Phase 5 — Recommendation
+
+Choose one paradigm. Justify it by:
+
+- Which top-3 quality attributes the choice serves
+- Which constraints it respects
+- Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y)
+- What you're trading away (be explicit)
+
+Flag items that need their own ADRs — use `arch-decide` to record them. Examples:
+
+- Primary database choice
+- Message bus vs. direct calls
+- Sync vs. async inter-service comms
+- Multi-tenancy approach
+- Authentication/authorization boundary
+
+## Output: `.architecture/brief.md`
+
+Write the final brief to `.architecture/brief.md` in the project root. Use this structure:
+
+```markdown
+# Architecture Brief — <Project Name>
+
+**Date:** <YYYY-MM-DD>
+**Authors:** <names>
+**Status:** Draft | Accepted | Revised
+
+## 1. System
+
+<One-paragraph description of what the system does.>
+
+## 2. Stakeholders and Users
+
+- **Primary users:** …
+- **Secondary users:** …
+- **Operators:** …
+- **Integrators / dependent systems:** …
+
+## 3. Scale Targets
+
+| Metric | Launch | +12 months |
+|---|---|---|
+| RPS | … | … |
+| Data volume | … | … |
+| Geographic spread | … | … |
+
+## 4. Quality Attributes (Prioritized)
+
+| Rank | Attribute | Notes |
+|---|---|---|
+| 1 | … | critical driver |
+| 2 | … | … |
+
+## 5. Constraints
+
+- **Technical:** …
+- **Organizational:** …
+- **Legal/Regulatory:** …
+- **Cost:** …
+- **Timeline:** …
+
+## 6. Candidates Considered
+
+### Candidate A — <paradigm>
+
+Summary. Fit. Cons. Cost. Open questions.
+
+### Candidate B — <paradigm>
+
+…
+
+## 7. Recommendation
+
+**Chosen paradigm:** …
+
+**Rationale:**
+- …
+
+**Trade-offs accepted:**
+- …
+
+**Rejected alternatives and why:**
+- …
+
+## 8. Open Decisions (Candidates for ADRs)
+
+- [ ] Primary database — driver: …
+- [ ] …
+
+## 9. Next Steps
+
+- Run `arch-decide` for each open decision above
+- Run `arch-document` to produce the full arc42 document
+- Run `arch-evaluate` once implementation begins to audit conformance
+```
+
+## Review Checklist
+
+Before marking the brief Accepted:
+
+- [ ] Every quality attribute is ranked; no "all critical"
+- [ ] Every constraint is explicit; no "we probably can't"
+- [ ] At least 2 candidates considered; each has real cons
+- [ ] Recommendation names what it's trading away
+- [ ] Open decisions listed; each will become an ADR via `arch-decide`
+- [ ] Stakeholders have seen and (ideally) approved
+
+## Anti-Patterns
+
+- **"All quality attributes are critical."** They aren't. Force ranking.
+- **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting.
+- **Single candidate.** One option means the user already decided; you're documenting, not designing.
+- **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so.
+- **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard.
+- **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation.
+
+## Hand-Off
+
+After the brief is Accepted:
+
+- `arch-decide` — record each Open Decision as an ADR under `docs/adr/`
+- `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams
+- `arch-evaluate` — once code exists, audit it against this brief