summaryrefslogtreecommitdiff
path: root/assets/outbox/2025-11-08-test-failure-analysis.org
blob: 56453c304cf0f148237163c08da255c213d7c685 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
#+TITLE: Test Failure Analysis - VM Test Run 20251108-204202
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2025-11-08

* Test Overview

Test ID: 20251108-204202
Date: 2025-11-08 21:16:11
VM: archsetup-test-20251108-204202
Result: **FAILED** (archsetup exited 0 but validation failed)

* Critical Findings

** PRIMARY ROOT CAUSE: Disk Space Exhausted

The 20GB VM disk ran out of space during package installation:

#+begin_example
error: Partition / too full: 90773 blocks needed, 9323 blocks free
error: not enough free disk space
error: failed to commit transaction (not enough free disk space)
#+end_example

This caused cascading failures of ~100+ packages after initial packages filled the disk.

*Impact:* Most package installation failures
*Severity:* CRITICAL
*Resolution:* ✅ FIXED - Increased VM disk size to 50GB (was 20GB)

** SECONDARY ROOT CAUSE: git.cjennings.net Server Unavailable

DWM, dmenu, and st failed to build due to 504 Gateway Timeout errors:

#+begin_example
Cloning into '/home/cjennings/.local/src/dwm'...
fatal: unable to access 'https://git.cjennings.net/dwm.git/': The requested URL returned error: 504
ERROR: cloning source code for dwm failed with error code 0
#+end_example

*Impact:* DWM validation check failed (critical)
*Severity:* HIGH
*Resolution:* ✅ RESOLVED - git.cjennings.net is working (verified 2025-11-08, transient 504 errors)

** VALIDATION FAILURE: DWM Not Found

Test validation checks:
- ✅ yay is installed
- ❌ DWM not found at /usr/local/bin/dwm

*Cause:* git.cjennings.net 504 errors prevented DWM build
*Impact:* Test marked as FAILED

* Error Summary

Total errors: 134

** Error Categories

*** Git Repository Access (3 errors)
- dwm clone/pull failed (504 error)
- dmenu clone/pull failed (504 error)
- st clone partially succeeded (permission warning)

*** Package Installation Failures (~100+ errors)
All caused by disk space exhaustion after initial packages installed.

Examples:
- emacs
- code (VS Code)
- virtualbox
- Many AUR packages (obsidian, warpinator, etc.)
- Standard packages (aspell, imagemagick, ffmpegthumbnailer, etc.)

*** Configuration Failures (2 errors)
- Dotfile restoration failed (error 128)
- Boot menu regeneration failed
- Blue light filter configuration failed

*** Other Errors
- prep to workaround tidal-dl issue failed

* Timeline of Failure

1. **20:44** - Dotfile restoration error (early warning sign)
2. **20:46** - Boot menu regeneration failed
3. **20:47-20:49** - git.cjennings.net 504 errors (DWM/dmenu/st)
4. **20:56** - First package failures start (nitrogen)
5. **21:03** - adwaita-color-schemes fails
6. **21:11** - Major package failures begin (disk full):
   - emacs
   - code
   - virtualbox
   - exercism-bin
   - And ~100+ more packages
7. **21:16** - archsetup completes (exit 0)
8. **21:16** - Validation fails (DWM not found)

* Affected Components

** Window Manager (Critical)
- ❌ DWM - Not built (git server error)
- ❌ dmenu - Not built (git server error)
- ⚠️  st - Partially built? (permission warning)

** Development Tools
- ❌ emacs
- ❌ code (VS Code)
- ❌ virtualbox
- ❌ exercism-bin
- ❌ libvips
- ❌ isync

** Desktop Applications
- ❌ obsidian
- ❌ warpinator
- ❌ valent
- ❌ nitrogen (wallpaper setter)
- ❌ foliate
- ❌ mcomix
- ❌ nsxiv

** System Utilities
- ❌ aspell / aspell-en
- ❌ imagemagick
- ❌ ffmpegthumbnailer
- ❌ 7zip
- ❌ fd
- ❌ And many more...

* Resolution Plan

** Immediate Actions (Before Next Test)

1. **✅ DONE - Increase VM Disk Size**
   - ✅ Changed from 20GB → 50GB
   - ✅ Updated create-base-vm.sh
   - ✅ Updated lib/vm-utils.sh
   - ✅ Updated scripts/testing/README.org
   - ✅ Updated docs/testing-strategy.org
   - ⏳ TODO: Re-create base VM

2. **✅ DONE - Verify git.cjennings.net Access**
   - ✅ Server is working (dwm cloned successfully)
   - ✅ 504 errors were transient network issues

3. **TODO - Re-run Test**
   - Re-create base VM with 50GB disk: ./scripts/testing/create-base-vm.sh
   - Run full test: ./scripts/testing/run-test.sh
   - Expected: Much fewer errors, all critical components should build

** Long-term Improvements

1. **Disk Space Monitoring**
   - Add disk usage checks during archsetup run
   - Warn if disk space < 5GB free
   - Fail fast if insufficient space detected early

2. **Repository Fallbacks**
   - Mirror critical repos to GitHub
   - Auto-fallback if primary git server unavailable
   - Document required repositories

3. **Better Error Reporting**
   - Distinguish "disk full" from "package doesn't exist"
   - Report root cause clearly
   - Group related failures

4. **Test Scenarios**
   - Add "minimum disk space" test
   - Add "offline installation" test (local package cache)
   - Add "repository unavailable" resilience test

* Lessons Learned

1. **20GB is insufficient** for full archsetup with all packages
   - Base system: ~3-5GB
   - Package downloads: ~5-10GB
   - AUR builds: ~5-10GB (tmpfs in VM?)
   - Installed packages: ~10-15GB
   - **Total needed: 40-50GB minimum**
   - **✅ FIXED: Increased to 50GB**

2. **External dependencies are fragile**
   - git.cjennings.net unavailability blocked critical components
   - Need fallback mechanisms
   - Consider hosting mirrors

3. **Cascading failures mask root cause**
   - Disk full caused 100+ package errors
   - Easy to miss the root cause in noise
   - Better error aggregation needed

4. **Validation checks are essential**
   - archsetup exited 0 (success) but system was broken
   - Validation caught DWM failure
   - Need more validation checks

* Next Test Expectations

After increasing disk to 50GB (git server was working, just transient 504s):

** Expected Results (with 50GB disk)
- ✅ archsetup exits with code 0
- ✅ User 'cjennings' created
- ✅ Dotfiles are stowed
- ✅ yay is installed
- ✅ DWM is built and installed
- ✅ Most/all packages installed successfully
- ✅ No disk space errors

** Acceptable Failures
- Some deprecated AUR packages may still fail
- Some optional packages may have build issues
- These should be < 10 errors, not 134

* Files Referenced

- Test report: [[file:../test-results/20251108-204202/test-report.txt]]
- Test log: [[file:../test-results/20251108-204202/test.log]]
- archsetup log: [[file:../test-results/20251108-204202/archsetup-2025-11-08-20-42-27.log]]
- Base VM creation: [[file:../test-results/create-base-vm-20251108-182022.log]]
- Auto-install script: [[file:../vm-images/auto-install.sh]]