---
description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the architecture suite (arch-design / arch-decide / arch-document / arch-evaluate + c4-analyze / c4-diagram for notation-specific diagramming).
disable-model-invocation: true
---

# Architecture Design

Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision.

## When to Use This Skill

- Starting a new project, service, or major subsystem
- Restructuring or re-platforming an existing system
- Selecting a primary tech stack for a green-field effort
- Establishing a formal architecture before a team scales
- Preparing a spike or prototype you want to keep coherent

## When NOT to Use This Skill

- Bug fixing or defect investigation
- Small feature additions inside an existing architecture
- Recording a single decision (use `arch-decide`)
- Producing formal documentation from a known architecture (use `arch-document`)
- Auditing an existing codebase against its stated architecture (use `arch-evaluate`)

## Workflow

The skill runs in six phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake.

1. **Intake** — elicit stakeholders, domain, scale, timeline, team
2. **Quality attributes** — prioritize (can't have everything)
3. **Constraints** — technical, organizational, legal, cost
4. **Trust, data, and compliance** — trust boundaries, data classification, abuse cases, privacy, compliance evidence, ownership
5. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis
6. **Recommendation** — pick one, justify it, flag open questions for ADRs

## Phase 1 — Intake

Ask the user to answer each. Short answers are fine; vague answers mean return to the question.

**System identity**
- What is this system? (One sentence.)
- Who uses it? Human end users, other services, both?
- What domain is it in? (commerce, health, comms, finance, ops, etc.)

**Scale**
- Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes)
- Data volume at launch and in 12 months? (rows/docs, GB)
- Geographic distribution? (single region, multi-region, edge)

**Team**
- Team size now? Growing?
- Existing language/stack expertise?
- Operational maturity? (on-call rotation, observability tooling, CI/CD)

**Timeline**
- When does it need to be in production?
- Is this replacing something? What's the migration path?

**Change drivers**
- What forces are likely to reshape this system? (new markets, regulatory, volume, organizational)

## Phase 2 — Quality Attributes

List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities.

Standard list (use this set; add domain-specific ones if relevant):

- **Performance** (latency, throughput under load)
- **Scalability** (horizontal, vertical, elastic)
- **Availability** (uptime target, failure tolerance)
- **Reliability** (data durability, correctness under partial failure)
- **Security** (authn/z, data protection, threat model)
- **Maintainability** (readability, testability, onboarding speed)
- **Observability** (logs, metrics, tracing, debuggability)
- **Deployability** (release cadence, rollback speed)
- **Cost** (infra, operational, total cost of ownership)
- **Compliance** (regulatory, audit, data residency)
- **Interoperability** (integration with existing systems, APIs, standards)
- **Flexibility / evolvability** (ease of adding features, changing direction)

Document the ranking verbatim in the brief. Future decisions hinge on it.

## Phase 3 — Constraints

Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide.

- **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move
- **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan
- **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention
- **Cost**: budget ceiling, licensing limits, infra cost targets
- **Timeline**: hard dates from business, regulatory deadlines
- **Compatibility**: backward-compatibility with existing clients, API contracts, data formats

## Phase 4 — Trust, Data, and Compliance

Security is one quality attribute in Phase 2, but several security and privacy concerns shape the structure itself, not just a later hardening pass. Surface them here, before the paradigm shortlist, so the architecture is drawn around them rather than retrofitted by a downstream `security-check`. These answers feed directly into where boundaries fall, how data flows, and which paradigms stay viable.

**Trust boundaries**
- Where does control change hands? (public internet → edge, edge → service mesh, service → data store, tenant → tenant)
- Which components are trusted, which are hostile-by-default, and what authenticates across each boundary?
- What's the blast radius if any one component is compromised?

**Data classification**
- What sensitivity of data flows where? Classify it (public, internal, confidential, regulated — PII, PHI, payment, secrets).
- Which stores and which paths carry the most sensitive class? That path constrains the design more than the average one.
- Where does sensitive data cross a trust boundary, and what protects it there (encryption in transit/at rest, tokenization, field-level controls)?

**Abuse and misuse cases**
- Who would attack this, and to what end? (fraud, exfiltration, denial of service, account takeover, abuse of a feature for an unintended purpose)
- For the top abuse cases, what's the architectural mitigation? (rate limiting at the edge, isolation between tenants, an audit trail that can't be tampered with)
- These are the inverse of user stories — name the attacker's goal, then design against it.

**Privacy constraints**
- What data minimization, retention, deletion, and consent obligations apply? (right to erasure, purpose limitation, data residency)
- Do any of these force a structural choice — separable per-user data, region-pinned stores, a deletion path that reaches every copy?

**Compliance evidence**
- What must be *provable* to an auditor, not just true? (access logs, change history, encryption attestation, data-flow records)
- Generating that evidence is an architectural requirement — an immutable audit log or a tamper-evident event store may be load-bearing, not optional.

**Operational ownership**
- Who owns each component in production — on-call, patching, secret rotation, incident response?
- Conway's Law applies to security too: a boundary no team owns is a boundary no one defends.

Document the trust boundaries, the data classification, and the top abuse cases in the brief. They become inputs to the trade-off analysis in the next phase and seed open decisions for ADRs.

## Phase 5 — Candidate Paradigms

Two distinct choices live here, and conflating them is a classic mistake. First pick the **paradigm** — the top-level structural choice that decides how the system is deployed and how its pieces relate. Then layer on the **tactical patterns** that compose onto that paradigm. They are not mutually exclusive options on one menu. A modular monolith built with DDD and CQRS in one module is a coherent, common choice — not a contradiction.

### Step 1 — Pick the paradigm

The paradigm is a whole-system decision; you choose one. Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed.

| Paradigm | Fits when… | Doesn't fit when… |
|---|---|---|
| **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements |
| **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity |
| **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate |
| **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity |
| **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control |
| **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency |
| **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale |

### Step 2 — Compose tactical patterns onto it

Tactical patterns refine the internal structure of a chosen paradigm, often per-module or per-service rather than system-wide. You can adopt several at once, or none. They sharpen how a part of the system is built; they don't replace the paradigm. Map each candidate paradigm to the tactical patterns worth applying inside it.

| Tactical pattern | Apply when… | Skip when… |
|---|---|---|
| **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system |
| **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit |
| **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified |
| **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive |

Examples of composition: a modular monolith with hexagonal modules and CQRS on the reporting module; microservices where two services use DDD aggregates and one uses event sourcing for its audit-critical ledger; a pipeline whose stages are hexagonal for testability. The paradigm is the frame; the tactical patterns are how you build inside it.

For each candidate, document:

- **Summary** — one paragraph on what the architecture would look like
- **Tactical patterns** — which patterns from Step 2 compose onto it, and where (which module or service)
- **Why it fits** — map to the ranked quality attributes
- **Why it might not** — the honest cons for this specific context
- **Cost** — team learning curve, operational overhead, infra impact
- **Open questions** — what would need to be answered or decided via ADR

## Phase 6 — Recommendation

Choose one paradigm and name the tactical patterns that compose onto it. Justify it by:

- Which top-3 quality attributes the choice serves
- Which constraints it respects
- How it answers the trust boundaries, data classification, and abuse cases from Phase 4
- Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y)
- What you're trading away (be explicit)

Flag items that need their own ADRs — use `arch-decide` to record them. Examples:

- Primary database choice
- Message bus vs. direct calls
- Sync vs. async inter-service comms
- Multi-tenancy approach
- Authentication/authorization boundary

## Output: `.architecture/brief.md`

Write the final brief to `.architecture/brief.md` in the project root. Use this structure:

```markdown
# Architecture Brief — <Project Name>

**Date:** <YYYY-MM-DD>
**Authors:** <names>
**Status:** Draft | Accepted | Revised

## 1. System

<One-paragraph description of what the system does.>

## 2. Stakeholders and Users

- **Primary users:** …
- **Secondary users:** …
- **Operators:** …
- **Integrators / dependent systems:** …

## 3. Scale Targets

| Metric | Launch | +12 months |
|---|---|---|
| RPS | … | … |
| Data volume | … | … |
| Geographic spread | … | … |

## 4. Quality Attributes (Prioritized)

| Rank | Attribute | Notes |
|---|---|---|
| 1 | … | critical driver |
| 2 | … | … |

## 5. Constraints

- **Technical:** …
- **Organizational:** …
- **Legal/Regulatory:** …
- **Cost:** …
- **Timeline:** …

## 6. Trust, Data, and Compliance

- **Trust boundaries:** … (where control changes hands; what authenticates across each)
- **Data classification:** … (sensitivity classes and the paths/stores that carry them)
- **Top abuse cases:** … (attacker goal → architectural mitigation)
- **Privacy constraints:** … (minimization, retention, deletion, residency)
- **Compliance evidence:** … (what must be provable to an auditor)
- **Operational ownership:** … (who owns each component in production)

## 7. Candidates Considered

### Candidate A — <paradigm>

Summary. Tactical patterns composed on it (and where). Fit. Cons. Cost. Open questions.

### Candidate B — <paradigm>

…

## 8. Recommendation

**Chosen paradigm:** …

**Tactical patterns composed onto it:** … (e.g. DDD + CQRS in the orders module)

**Rationale:**
- …

**Trade-offs accepted:**
- …

**Rejected alternatives and why:**
- …

## 9. Open Decisions (Candidates for ADRs)

- [ ] Primary database — driver: …
- [ ] …

## 10. Next Steps

- Run `arch-decide` for each open decision above
- Run `arch-document` to produce the full arc42 document
- Run `arch-evaluate` once implementation begins to audit conformance
```

## Review Checklist

Before marking the brief Accepted:

- [ ] Every quality attribute is ranked; no "all critical"
- [ ] Every constraint is explicit; no "we probably can't"
- [ ] Trust boundaries, data classification, and top abuse cases are documented — not deferred to a later security pass
- [ ] At least 2 candidates considered; each has real cons
- [ ] Recommendation names a single paradigm plus the tactical patterns composed onto it
- [ ] Recommendation names what it's trading away
- [ ] Open decisions listed; each will become an ADR via `arch-decide`
- [ ] Stakeholders have seen and (ideally) approved

## Anti-Patterns

- **"All quality attributes are critical."** They aren't. Force ranking.
- **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting.
- **Single candidate.** One option means the user already decided; you're documenting, not designing.
- **Paradigm vs. pattern confusion.** Treating DDD, CQRS, or event sourcing as alternatives to a modular monolith or microservices. The paradigm is the structural frame; tactical patterns compose inside it. Pick one paradigm, then layer patterns onto it.
- **Security deferred to later.** Treating trust boundaries, data classification, and abuse cases as a hardening pass after the design is set. They shape where boundaries fall — surface them in Phase 4, not after.
- **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so.
- **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard.
- **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation.

## Hand-Off

After the brief is Accepted:

- `arch-decide` — record each Open Decision as an ADR under `docs/adr/`
- `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams
- `arch-evaluate` — once code exists, audit it against this brief

## Content scope

Output this skill produces that gets committed or shared with the team must follow the *Content scope for public artifacts* rule in [`commits.md`](../claude-rules/commits.md): no local paths, no private repo names, no personal tooling references.