--- description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming). disable-model-invocation: true --- # Architecture Design Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision. ## When to Use This Skill - Starting a new project, service, or major subsystem - Restructuring or re-platforming an existing system - Selecting a primary tech stack for a green-field effort - Establishing a formal architecture before a team scales - Preparing a spike or prototype you want to keep coherent ## When NOT to Use This Skill - Bug fixing or defect investigation - Small feature additions inside an existing architecture - Recording a single decision (use `arch-decide`) - Producing formal documentation from a known architecture (use `arch-document`) - Auditing an existing codebase against its stated architecture (use `arch-evaluate`) ## Workflow The skill runs in six phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake. 1. **Intake** — elicit stakeholders, domain, scale, timeline, team 2. **Quality attributes** — prioritize (can't have everything) 3. **Constraints** — technical, organizational, legal, cost 4. **Trust, data, and compliance** — trust boundaries, data classification, abuse cases, privacy, compliance evidence, ownership 5. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis 6. **Recommendation** — pick one, justify it, flag open questions for ADRs ## Phase 1 — Intake Ask the user to answer each. Short answers are fine; vague answers mean return to the question. **System identity** - What is this system? (One sentence.) - Who uses it? Human end users, other services, both? - What domain is it in? (commerce, health, comms, finance, ops, etc.) **Scale** - Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes) - Data volume at launch and in 12 months? (rows/docs, GB) - Geographic distribution? (single region, multi-region, edge) **Team** - Team size now? Growing? - Existing language/stack expertise? - Operational maturity? (on-call rotation, observability tooling, CI/CD) **Timeline** - When does it need to be in production? - Is this replacing something? What's the migration path? **Change drivers** - What forces are likely to reshape this system? (new markets, regulatory, volume, organizational) ## Phase 2 — Quality Attributes List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities. Standard list (use this set; add domain-specific ones if relevant): - **Performance** (latency, throughput under load) - **Scalability** (horizontal, vertical, elastic) - **Availability** (uptime target, failure tolerance) - **Reliability** (data durability, correctness under partial failure) - **Security** (authn/z, data protection, threat model) - **Maintainability** (readability, testability, onboarding speed) - **Observability** (logs, metrics, tracing, debuggability) - **Deployability** (release cadence, rollback speed) - **Cost** (infra, operational, total cost of ownership) - **Compliance** (regulatory, audit, data residency) - **Interoperability** (integration with existing systems, APIs, standards) - **Flexibility / evolvability** (ease of adding features, changing direction) Document the ranking verbatim in the brief. Future decisions hinge on it. ## Phase 3 — Constraints Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide. - **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move - **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan - **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention - **Cost**: budget ceiling, licensing limits, infra cost targets - **Timeline**: hard dates from business, regulatory deadlines - **Compatibility**: backward-compatibility with existing clients, API contracts, data formats ## Phase 4 — Trust, Data, and Compliance Security is one quality attribute in Phase 2, but several security and privacy concerns shape the structure itself, not just a later hardening pass. Surface them here, before the paradigm shortlist, so the architecture is drawn around them rather than retrofitted by a downstream `security-check`. These answers feed directly into where boundaries fall, how data flows, and which paradigms stay viable. **Trust boundaries** - Where does control change hands? (public internet → edge, edge → service mesh, service → data store, tenant → tenant) - Which components are trusted, which are hostile-by-default, and what authenticates across each boundary? - What's the blast radius if any one component is compromised? **Data classification** - What sensitivity of data flows where? Classify it (public, internal, confidential, regulated — PII, PHI, payment, secrets). - Which stores and which paths carry the most sensitive class? That path constrains the design more than the average one. - Where does sensitive data cross a trust boundary, and what protects it there (encryption in transit/at rest, tokenization, field-level controls)? **Abuse and misuse cases** - Who would attack this, and to what end? (fraud, exfiltration, denial of service, account takeover, abuse of a feature for an unintended purpose) - For the top abuse cases, what's the architectural mitigation? (rate limiting at the edge, isolation between tenants, an audit trail that can't be tampered with) - These are the inverse of user stories — name the attacker's goal, then design against it. **Privacy constraints** - What data minimization, retention, deletion, and consent obligations apply? (right to erasure, purpose limitation, data residency) - Do any of these force a structural choice — separable per-user data, region-pinned stores, a deletion path that reaches every copy? **Compliance evidence** - What must be *provable* to an auditor, not just true? (access logs, change history, encryption attestation, data-flow records) - Generating that evidence is an architectural requirement — an immutable audit log or a tamper-evident event store may be load-bearing, not optional. **Operational ownership** - Who owns each component in production — on-call, patching, secret rotation, incident response? - Conway's Law applies to security too: a boundary no team owns is a boundary no one defends. Document the trust boundaries, the data classification, and the top abuse cases in the brief. They become inputs to the trade-off analysis in the next phase and seed open decisions for ADRs. ## Phase 5 — Candidate Paradigms Two distinct choices live here, and conflating them is a classic mistake. First pick the **paradigm** — the top-level structural choice that decides how the system is deployed and how its pieces relate. Then layer on the **tactical patterns** that compose onto that paradigm. They are not mutually exclusive options on one menu. A modular monolith built with DDD and CQRS in one module is a coherent, common choice — not a contradiction. ### Step 1 — Pick the paradigm The paradigm is a whole-system decision; you choose one. Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed. | Paradigm | Fits when… | Doesn't fit when… | |---|---|---| | **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements | | **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity | | **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate | | **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity | | **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control | | **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency | | **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale | ### Step 2 — Compose tactical patterns onto it Tactical patterns refine the internal structure of a chosen paradigm, often per-module or per-service rather than system-wide. You can adopt several at once, or none. They sharpen how a part of the system is built; they don't replace the paradigm. Map each candidate paradigm to the tactical patterns worth applying inside it. | Tactical pattern | Apply when… | Skip when… | |---|---|---| | **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system | | **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit | | **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified | | **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive | Examples of composition: a modular monolith with hexagonal modules and CQRS on the reporting module; microservices where two services use DDD aggregates and one uses event sourcing for its audit-critical ledger; a pipeline whose stages are hexagonal for testability. The paradigm is the frame; the tactical patterns are how you build inside it. For each candidate, document: - **Summary** — one paragraph on what the architecture would look like - **Tactical patterns** — which patterns from Step 2 compose onto it, and where (which module or service) - **Why it fits** — map to the ranked quality attributes - **Why it might not** — the honest cons for this specific context - **Cost** — team learning curve, operational overhead, infra impact - **Open questions** — what would need to be answered or decided via ADR ## Phase 6 — Recommendation Choose one paradigm and name the tactical patterns that compose onto it. Justify it by: - Which top-3 quality attributes the choice serves - Which constraints it respects - How it answers the trust boundaries, data classification, and abuse cases from Phase 4 - Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y) - What you're trading away (be explicit) Flag items that need their own ADRs — use `arch-decide` to record them. Examples: - Primary database choice - Message bus vs. direct calls - Sync vs. async inter-service comms - Multi-tenancy approach - Authentication/authorization boundary ## Output: `.architecture/brief.md` Write the final brief to `.architecture/brief.md` in the project root. Use this structure: ```markdown # Architecture Brief — **Date:** **Authors:** **Status:** Draft | Accepted | Revised ## 1. System ## 2. Stakeholders and Users - **Primary users:** … - **Secondary users:** … - **Operators:** … - **Integrators / dependent systems:** … ## 3. Scale Targets | Metric | Launch | +12 months | |---|---|---| | RPS | … | … | | Data volume | … | … | | Geographic spread | … | … | ## 4. Quality Attributes (Prioritized) | Rank | Attribute | Notes | |---|---|---| | 1 | … | critical driver | | 2 | … | … | ## 5. Constraints - **Technical:** … - **Organizational:** … - **Legal/Regulatory:** … - **Cost:** … - **Timeline:** … ## 6. Trust, Data, and Compliance - **Trust boundaries:** … (where control changes hands; what authenticates across each) - **Data classification:** … (sensitivity classes and the paths/stores that carry them) - **Top abuse cases:** … (attacker goal → architectural mitigation) - **Privacy constraints:** … (minimization, retention, deletion, residency) - **Compliance evidence:** … (what must be provable to an auditor) - **Operational ownership:** … (who owns each component in production) ## 7. Candidates Considered ### Candidate A — Summary. Tactical patterns composed on it (and where). Fit. Cons. Cost. Open questions. ### Candidate B — … ## 8. Recommendation **Chosen paradigm:** … **Tactical patterns composed onto it:** … (e.g. DDD + CQRS in the orders module) **Rationale:** - … **Trade-offs accepted:** - … **Rejected alternatives and why:** - … ## 9. Open Decisions (Candidates for ADRs) - [ ] Primary database — driver: … - [ ] … ## 10. Next Steps - Run `arch-decide` for each open decision above - Run `arch-document` to produce the full arc42 document - Run `arch-evaluate` once implementation begins to audit conformance ``` ## Review Checklist Before marking the brief Accepted: - [ ] Every quality attribute is ranked; no "all critical" - [ ] Every constraint is explicit; no "we probably can't" - [ ] Trust boundaries, data classification, and top abuse cases are documented — not deferred to a later security pass - [ ] At least 2 candidates considered; each has real cons - [ ] Recommendation names a single paradigm plus the tactical patterns composed onto it - [ ] Recommendation names what it's trading away - [ ] Open decisions listed; each will become an ADR via `arch-decide` - [ ] Stakeholders have seen and (ideally) approved ## Anti-Patterns - **"All quality attributes are critical."** They aren't. Force ranking. - **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting. - **Single candidate.** One option means the user already decided; you're documenting, not designing. - **Paradigm vs. pattern confusion.** Treating DDD, CQRS, or event sourcing as alternatives to a modular monolith or microservices. The paradigm is the structural frame; tactical patterns compose inside it. Pick one paradigm, then layer patterns onto it. - **Security deferred to later.** Treating trust boundaries, data classification, and abuse cases as a hardening pass after the design is set. They shape where boundaries fall — surface them in Phase 4, not after. - **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so. - **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard. - **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation. ## Hand-Off After the brief is Accepted: - `arch-decide` — record each Open Decision as an ADR under `docs/adr/` - `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams - `arch-evaluate` — once code exists, audit it against this brief ## Content scope Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.