arch-design/SKILL.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245

---
name: arch-design
description: Shape the architecture of a new or restructured software project through structured intake (quality attributes, stakeholders, constraints, scale, change drivers), then propose candidate architectural paradigms with honest trade-off analysis and a recommended direction. Paradigm-agnostic — evaluates options across layered, hexagonal, microservices, event-driven, CQRS, modular-monolith, serverless, pipe-and-filter, DDD, and others. Outputs a brief at `.architecture/brief.md` that downstream skills (arch-decide, arch-document, arch-evaluate) read. Use when starting a new project or service, restructuring an existing one, choosing a tech stack, or formalizing architecture before implementation. Do NOT use for bug fixing, code review, small feature additions, documenting an existing architecture (use arch-document), evaluating an existing architecture against a brief (use arch-evaluate), or recording specific individual decisions (use arch-decide). Part of the arch-* suite (arch-design / arch-decide / arch-document / arch-evaluate).
---

# Architecture Design

Elicit the problem, surface the real drivers, propose a fit, and commit it to writing. One working session yields a `.architecture/brief.md` that anchors every downstream architectural decision.

## When to Use This Skill

- Starting a new project, service, or major subsystem
- Restructuring or re-platforming an existing system
- Selecting a primary tech stack for a green-field effort
- Establishing a formal architecture before a team scales
- Preparing a spike or prototype you want to keep coherent

## When NOT to Use This Skill

- Bug fixing or defect investigation
- Small feature additions inside an existing architecture
- Recording a single decision (use `arch-decide`)
- Producing formal documentation from a known architecture (use `arch-document`)
- Auditing an existing codebase against its stated architecture (use `arch-evaluate`)

## Workflow

The skill runs in five phases. Each produces a section of the output brief. Do not skip phases — the trade-off analysis is only as good as the intake.

1. **Intake** — elicit stakeholders, domain, scale, timeline, team
2. **Quality attributes** — prioritize (can't have everything)
3. **Constraints** — technical, organizational, legal, cost
4. **Candidate paradigms** — shortlist 2-4 with honest trade-off analysis
5. **Recommendation** — pick one, justify it, flag open questions for ADRs

## Phase 1 — Intake

Ask the user to answer each. Short answers are fine; vague answers mean return to the question.

**System identity**
- What is this system? (One sentence.)
- Who uses it? Human end users, other services, both?
- What domain is it in? (commerce, health, comms, finance, ops, etc.)

**Scale**
- Expected traffic at launch and in 12 months? (RPS, MAU, payload sizes)
- Data volume at launch and in 12 months? (rows/docs, GB)
- Geographic distribution? (single region, multi-region, edge)

**Team**
- Team size now? Growing?
- Existing language/stack expertise?
- Operational maturity? (on-call rotation, observability tooling, CI/CD)

**Timeline**
- When does it need to be in production?
- Is this replacing something? What's the migration path?

**Change drivers**
- What forces are likely to reshape this system? (new markets, regulatory, volume, organizational)

## Phase 2 — Quality Attributes

List the relevant quality attributes and force the user to rank them 1-5 (1 = critical, 5 = nice-to-have). You can't optimize everything; the ranking surfaces real priorities.

Standard list (use this set; add domain-specific ones if relevant):

- **Performance** (latency, throughput under load)
- **Scalability** (horizontal, vertical, elastic)
- **Availability** (uptime target, failure tolerance)
- **Reliability** (data durability, correctness under partial failure)
- **Security** (authn/z, data protection, threat model)
- **Maintainability** (readability, testability, onboarding speed)
- **Observability** (logs, metrics, tracing, debuggability)
- **Deployability** (release cadence, rollback speed)
- **Cost** (infra, operational, total cost of ownership)
- **Compliance** (regulatory, audit, data residency)
- **Interoperability** (integration with existing systems, APIs, standards)
- **Flexibility / evolvability** (ease of adding features, changing direction)

Document the ranking verbatim in the brief. Future decisions hinge on it.

## Phase 3 — Constraints

Enumerate what's fixed. Each constraint narrows the design space — make them explicit so trade-offs don't hide.

- **Technical**: existing infrastructure, mandated languages/platforms, integration points that can't move
- **Organizational**: team structure (Conway's Law — the org shape becomes the arch shape), existing expertise, hiring plan
- **Legal/regulatory**: GDPR, HIPAA, FedRAMP, data residency, audit retention
- **Cost**: budget ceiling, licensing limits, infra cost targets
- **Timeline**: hard dates from business, regulatory deadlines
- **Compatibility**: backward-compatibility with existing clients, API contracts, data formats

## Phase 4 — Candidate Paradigms

Pick 2-4 candidates that plausibly fit the quality-attribute ranking and constraints. Analyze each honestly — include the reasons it would fail, not just succeed.

Common paradigms to consider:

| Paradigm | Fits when… | Doesn't fit when… |
|---|---|---|
| **Modular monolith** | Small team, fast iteration, strong module boundaries, single deployment OK | Independent team scaling, different availability SLAs per module, polyglot requirements |
| **Microservices** | Multiple teams, independent deploy cadences, polyglot, different scaling needs | Small team, tight transactional consistency needs, low operational maturity |
| **Layered (n-tier)** | CRUD-heavy, clear request/response, team familiar with MVC | Complex domain logic, event-driven needs, async workflows dominate |
| **Hexagonal / Ports & Adapters** | Business logic isolation, multiple interface types (HTTP + CLI + queue), testability priority | Trivial domains, overhead outweighs benefit |
| **Event-driven / pub-sub** | Async workflows, fan-out to multiple consumers, decoupled evolution | Strong ordering + consistency needs, small team, low operational maturity |
| **CQRS** | Read/write workload asymmetry, different optimization needs, audit trail required | Simple CRUD, no asymmetry, overhead not justified |
| **Event sourcing** | Audit trail critical, temporal queries needed, reconstruction from events valuable | Simple state, team lacks event-sourcing expertise, storage cost prohibitive |
| **Serverless (FaaS)** | Event-driven + variable load + fast iteration + accept vendor lock-in | Steady high load, latency-sensitive, long-running processes, tight cost control |
| **Pipe-and-filter / pipeline** | Data transformation workflows, ETL, stream processing | Interactive request/response, low-latency |
| **Space-based / in-memory grid** | Extreme throughput, elastic scale, in-memory OK | Strong durability required from day one, small scale |
| **DDD (tactical)** | Complex domain, domain experts available, long-lived system | Simple CRUD, no real domain complexity, short-lived system |

For each candidate, document:

- **Summary** — one paragraph on what the architecture would look like
- **Why it fits** — map to the ranked quality attributes
- **Why it might not** — the honest cons for this specific context
- **Cost** — team learning curve, operational overhead, infra impact
- **Open questions** — what would need to be answered or decided via ADR

## Phase 5 — Recommendation

Choose one paradigm. Justify it by:

- Which top-3 quality attributes the choice serves
- Which constraints it respects
- Why the rejected alternatives were rejected (not just "we picked X"; say why *not* Y)
- What you're trading away (be explicit)

Flag items that need their own ADRs — use `arch-decide` to record them. Examples:

- Primary database choice
- Message bus vs. direct calls
- Sync vs. async inter-service comms
- Multi-tenancy approach
- Authentication/authorization boundary

## Output: `.architecture/brief.md`

Write the final brief to `.architecture/brief.md` in the project root. Use this structure:

```markdown
# Architecture Brief — <Project Name>

**Date:** <YYYY-MM-DD>
**Authors:** <names>
**Status:** Draft | Accepted | Revised

## 1. System

<One-paragraph description of what the system does.>

## 2. Stakeholders and Users

- **Primary users:** …
- **Secondary users:** …
- **Operators:** …
- **Integrators / dependent systems:** …

## 3. Scale Targets

| Metric | Launch | +12 months |
|---|---|---|
| RPS | … | … |
| Data volume | … | … |
| Geographic spread | … | … |

## 4. Quality Attributes (Prioritized)

| Rank | Attribute | Notes |
|---|---|---|
| 1 | … | critical driver |
| 2 | … | … |

## 5. Constraints

- **Technical:** …
- **Organizational:** …
- **Legal/Regulatory:** …
- **Cost:** …
- **Timeline:** …

## 6. Candidates Considered

### Candidate A — <paradigm>

Summary. Fit. Cons. Cost. Open questions.

### Candidate B — <paradigm>

…

## 7. Recommendation

**Chosen paradigm:** …

**Rationale:**
- …

**Trade-offs accepted:**
- …

**Rejected alternatives and why:**
- …

## 8. Open Decisions (Candidates for ADRs)

- [ ] Primary database — driver: …
- [ ] …

## 9. Next Steps

- Run `arch-decide` for each open decision above
- Run `arch-document` to produce the full arc42 document
- Run `arch-evaluate` once implementation begins to audit conformance
```

## Review Checklist

Before marking the brief Accepted:

- [ ] Every quality attribute is ranked; no "all critical"
- [ ] Every constraint is explicit; no "we probably can't"
- [ ] At least 2 candidates considered; each has real cons
- [ ] Recommendation names what it's trading away
- [ ] Open decisions listed; each will become an ADR via `arch-decide`
- [ ] Stakeholders have seen and (ideally) approved

## Anti-Patterns

- **"All quality attributes are critical."** They aren't. Force ranking.
- **"Let's go microservices."** Without a ranking, constraints, and team context, that's cargo-culting.
- **Single candidate.** One option means the user already decided; you're documenting, not designing.
- **Silent trade-offs.** If you choose availability, you're trading latency or cost. Say so.
- **No open decisions.** A real architecture has open questions. Listing none means you haven't looked hard.
- **"Design doc" with no implementation implications.** The brief must be actionable — it drives ADRs, documentation, and evaluation.

## Hand-Off

After the brief is Accepted:

- `arch-decide` — record each Open Decision as an ADR under `docs/adr/`
- `arch-document` — expand into a full arc42-structured document under `docs/architecture/` with C4 diagrams
- `arch-evaluate` — once code exists, audit it against this brief