docs/sessions/refactor.org


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617

#+TITLE: Test-Driven Quality Engineering Session Process
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2025-11-01

* Overview

This document describes a comprehensive test-driven quality engineering session process applicable to any source code module. The session demonstrates systematic testing practices, refactoring for testability, bug discovery through tests, and decision-making processes when tests fail.

* Session Goals

1. Add comprehensive unit test coverage for testable functions in your module
2. Discover and fix bugs through systematic testing
3. Follow quality engineering principles from =ai-prompts/quality-engineer.org=
4. Demonstrate refactoring patterns for testability
5. Document the decision-making process for test vs production code issues

* Phase 1: Feature Addition with Testability in Mind

** The Feature Request

Add new functionality that requires user interaction combined with business logic.

Example requirements:
- Present user with options (e.g., interactive selection)
- Allow cancellation
- Perform an operation with the selected input
- Provide clear success/failure feedback

** Refactoring for Testability

Following the "Interactive vs Non-Interactive Function Pattern" from =quality-engineer.org=:

*Problem:* Directly implementing as an interactive function would require:
- Mocking user interface components
- Mocking framework-specific APIs
- Testing UI functionality, not core business logic

*Solution:* Split into two functions:

1. *Helper Function* (internal implementation):
   - Pure, deterministic
   - Takes explicit parameters
   - No user interaction
   - Returns values or signals errors naturally
   - 100% testable, no mocking needed

2. *Interactive Wrapper* (public interface):
   - Thin layer handling only user interaction
   - Gets input from user/context
   - Presents UI (prompts, selections, etc.)
   - Catches errors and displays messages
   - Delegates all business logic to helper
   - No tests needed (just testing framework UI)

** Benefits of This Pattern

From =quality-engineer.org=:
#+begin_quote
When writing functions that combine business logic with user interaction:
- Split into internal implementation and interactive wrapper
- Internal function: Pure logic, takes all parameters explicitly
- Dramatically simpler testing (no interactive mocking)
- Code reusable programmatically without prompts
- Clear separation of concerns (logic vs UI)
#+end_quote

This pattern enables:
- Zero mocking in tests
- Fast, deterministic tests
- Easy reasoning about correctness
- Reusable helper function

* Phase 2: Writing the First Test

** Test File Naming

Following the naming convention from =quality-engineer.org=:
- Pattern: =test-<module>-<function>.<ext>=
- One test file per function for easy discovery when tests fail
- Developer sees failure → immediately knows which file to open

** Test Organization

Following the three-category structure:

*** Normal Cases
- Standard expected inputs
- Common use case scenarios
- Happy path operations
- Multiple operations in sequence

*** Boundary Cases
- Very long inputs
- Unicode characters (中文, emoji)
- Special characters and edge cases
- Empty or minimal data
- Maximum values

*** Error Cases
- Invalid inputs
- Nonexistent resources
- Permission denied scenarios
- Wrong type of input

** Writing Tests with Zero Mocking

Key principle: "Don't mock what you're testing" (from =quality-engineer.org=)

Example test structure:
#+begin_src
test_function_normal_case_expected_result()
  setup()
  try:
      # Arrange
      input_data = create_test_data()
      expected_output = define_expected_result()

      # Act
      actual_output = function_under_test(input_data)

      # Assert
      assert actual_output == expected_output
  finally:
      teardown()
#+end_src

Key characteristics:
- No mocks for the function being tested
- Real resources (files, data structures) using test utilities
- Tests actual function behavior
- Clean setup/teardown
- Clear arrange-act-assert structure

** Result

When helper functions are well-factored and deterministic, tests often pass on first run.

* Phase 3: Systematic Test Coverage Analysis

** Identifying Testable Functions

Review all functions in your module and categorize by testability:

*** Easy to Test (Pure/Deterministic)
- Input validation functions
- String manipulation/formatting
- Data structure transformations
- File parsing (read-only operations)
- Configuration/option processing

*** Medium Complexity (Need External Resources)
- File I/O operations
- Recursive algorithms
- Data structure generation
- Cache or state management

*** Hard to Test (Framework/Context Dependencies)
- Functions requiring specific runtime environment
- UI/buffer/window management
- Functions tightly coupled to framework internals
- Functions requiring complex mocking setup

*Decision:* Test easy and medium complexity functions. Skip framework-dependent functions that would require extensive mocking/setup (diminishing returns).

** File Organization Principle

From =quality-engineer.org=:
#+begin_quote
*Unit Tests*: One file per method
- Naming: =test-<filename>-<methodname>.<ext>=
- Example: =test-module--function.ext=
#+end_quote

*Rationale:* When a test fails in CI:
1. Developer sees: =test-module--function-normal-case-returns-result FAILED=
2. Immediately knows: Look for =test-module--function.<ext>=
3. Opens file and fixes issue - *fast cognitive path*

If combined files:
1. Test fails: =test-module--function-normal-case-returns-result FAILED=
2. Which file? =test-module--helpers.<ext>=? =test-module--combined.<ext>=?
3. Developer wastes time searching - *slower, frustrating*

*The 1:1 mapping is a usability feature for developers under pressure.*

* Phase 4: Testing Function by Function

** Example 1: Input Validation Function

*** Test Categories

*Normal Cases:*
- Valid inputs
- Case variations
- Common use cases

*Boundary Cases:*
- Edge cases in input format
- Multiple delimiters or separators
- Empty or minimal input
- Very long input

*Error Cases:*
- Nil/null input
- Wrong type
- Malformed input

*** First Run: Most Passed, Some FAILED

*Example Failure:*
#+begin_src
test-module--validate-input-error-nil-input-returns-nil
Expected: Returns nil gracefully
Actual: (TypeError/NullPointerException) - CRASHED
#+end_src

*** Bug Analysis: Test or Production Code?

*Process:*
1. Read the test expectation: "nil input returns nil/false gracefully"
2. Read the production code:
   #+begin_src
   function validate_input(input):
       extension = get_extension(input)  # ← Crashes here on nil/null
       return extension in valid_extensions
   #+end_src
3. Identify issue: Function expects string, crashes on nil/null
4. Consider context: This is defensive validation code, called in various contexts

*Decision: Fix production code*

*Rationale:*
- Function should be defensive (validation code)
- Returning false/nil for invalid input is more robust than crashing
- Common pattern in validation functions
- Better user experience

*Fix:*
#+begin_src
function validate_input(input):
    if input is None or not isinstance(input, str):  # ← Guard added
        return False
    extension = get_extension(input)
    return extension in valid_extensions
#+end_src

Result: All tests pass after adding defensive checks.

** Example 2: Another Validation Function

*** First Run: Most Passed, Multiple FAILED

*Failures:*
1. Nil input crashed (same pattern as previous function)
2. Empty string returned unexpected value (edge case not handled)

*Fix:*
#+begin_src
function validate_resource(resource):
    # Guards added for nil/null and empty string
    if not resource or not isinstance(resource, str) or resource.strip() == "":
        return False

    # Original validation logic
    return is_valid_resource(resource) and meets_criteria(resource)
#+end_src

Result: All tests pass after adding comprehensive guards.

** Example 3: String Sanitization Function

*** First Run: Most Passed, 1 FAILED

*Failure:*
#+begin_src
test-module--sanitize-boundary-special-chars-replaced
Expected: "output__________" (10 underscores)
Actual: "output_________" (9 underscores)
#+end_src

*** Bug Analysis: Test or Production Code?

*Process:*
1. Count special chars in test input: 9 characters
2. Test expected 10 replacements, but input only has 9
3. Production code is working correctly

*Decision: Fix test code*

*The bug was in the test expectation, not the implementation.*

Result: All tests pass after correcting test expectations.

** Example 4: File/Data Parser Function

This is where a **significant bug** was discovered through testing!

*** Test Categories

*Normal Cases:*
- Absolute paths/references
- Relative paths (expanded to base directory)
- URLs/URIs preserved as-is
- Mixed types of references

*Boundary Cases:*
- Empty lines ignored
- Whitespace-only lines ignored
- Comments ignored (format-specific)
- Leading/trailing whitespace trimmed
- Order preserved

*Error Cases:*
- Nonexistent file
- Nil/null input

*** First Run: Majority Passed, Multiple FAILED

All failures related to URL/URI handling:

*Failure Pattern:*
#+begin_src
Expected: "http://example.com/resource"
Actual: "/base/path/http:/example.com/resource"
#+end_src

URLs were being treated as relative paths and corrupted!

*** Root Cause Analysis

*Production code:*
#+begin_src
if line.matches("^\(https?|mms\)://"):  # Pattern detection
    # Handle as URL
#+end_src

*Problem:* Pattern matching is incorrect!

The pattern/regex has an error:
- Incorrect escaping or syntax
- Pattern fails to match valid URLs
- All URLs fall through to the "relative path" handler

The pattern never matched, so URLs were incorrectly processed as relative paths.

*Correct version:*
#+begin_src
if line.matches("^(https?|mms)://"):  # Fixed pattern
    # Handle as URL
#+end_src

Common causes of this type of bug:
- String escaping issues in the language
- Incorrect regex syntax
- Copy-paste errors in patterns

*** Impact Assessment

*This is a significant bug:*
- Remote resources (URLs) would be broken
- Data corruption: URLs transformed into invalid paths
- Function worked for local/simple cases, so bug went unnoticed
- Users would see mysterious errors when using remote resources
- Potential data loss or corruption in production

*Tests caught a real production bug that could have caused user data corruption!*

Result: All tests pass after fixing the pattern matching logic.

* Phase 5: Continuing Through the Test Suite

** Additional Functions Tested Successfully

As testing continues through the module, patterns emerge:

*Function: Directory/File Listing*
   - Learning: Directory listing order may be filesystem-dependent
   - Solution: Sort results before comparing in tests

*Function: Data Extraction*
   - Keep as separate test file (don't combine with related functions)
   - Reason: Usability when tests fail

*Function: Recursive Operations*
   - Medium complexity: Required creating test data structures/trees
   - Use test utilities for setup/teardown
   - Well-factored functions often pass all tests initially

*Function: Higher-Order Functions*
   - Test functions that return functions/callbacks
   - Initially may misunderstand framework/protocol behavior
   - Fix test expectations to match actual framework behavior

* Key Principles Applied

** 1. Refactor for Testability BEFORE Writing Tests

The Interactive vs Non-Interactive pattern from =quality-engineer.org= made testing trivial:
- No mocking required
- Fast, deterministic tests
- Clear separation of concerns

** 2. Systematic Test Organization

Every test file followed the same structure:
- Normal Cases
- Boundary Cases
- Error Cases

This makes it easy to:
- Identify coverage gaps
- Add new tests
- Understand what's being tested

** 3. Test Naming Convention

Pattern: =test-<module>-<function>-<category>-<scenario>-<expected-result>=

Examples:
- =test-module--validate-input-normal-valid-extension-returns-true=
- =test-module--parse-data-boundary-empty-lines-ignored=
- =test-module--sanitize-error-nil-input-signals-error=

Benefits:
- Self-documenting
- Easy to understand what failed
- Searchable/grepable
- Clear category organization

** 4. Zero Mocking for Pure Functions

From =quality-engineer.org=:
#+begin_quote
DON'T MOCK WHAT YOU'RE TESTING
- Only mock external side-effects and dependencies, not the domain logic itself
- If mocking removes the actual work the function performs, you're testing the mock
- Use real data structures that the function is designed to operate on
- Rule of thumb: If the function body could be =(error "not implemented")= and tests still pass, you've over-mocked
#+end_quote

Our tests used:
- Real file I/O
- Real strings
- Real data structures
- Actual function behavior

Result: Tests caught real bugs, not mock configuration issues.

** 5. Test vs Production Code Bug Decision Framework

When a test fails, ask:

1. *What is the test expecting?*
   - Read the test name and assertions
   - Understand the intended behavior

2. *What is the production code doing?*
   - Read the implementation
   - Trace through the logic

3. *Which is correct?*
   - Is the test expectation reasonable?
   - Is the production behavior defensive/robust?
   - What is the usage context?

4. *Consider the impact:*
   - Defensive code: Fix production to handle edge cases
   - Wrong expectation: Fix test
   - Unclear spec: Ask user for clarification

Examples from our session:
- *Nil input crashes* → Fix production (defensive coding)
- *Empty string treated as valid* → Fix production (defensive coding)
- *Wrong count in test* → Fix test (test bug)
- *Regex escaping wrong* → Fix production (real bug!)

** 6. Fast Feedback Loop

Pattern: "Write tests, run them all, report errors, and see where we are!"

This became a mantra during the session:
1. Write comprehensive tests for one function
2. Run immediately
3. Analyze failures
4. Fix bugs (test or production)
5. Verify all tests pass
6. Move to next function

Benefits:
- Caught bugs immediately
- Small iteration cycles
- Clear progress
- High confidence in changes

* Final Results

** Test Coverage Example

*Multiple functions tested with comprehensive coverage:*
1. File operation helper - ~10-15 tests
2. Input validation function - ~15 tests
3. Resource validation function - ~13 tests
4. String sanitization function - ~13 tests
5. File/data parser function - ~15 tests
6. Directory listing function - ~7 tests
7. Data extraction function - ~6 tests
8. Recursive operation function - ~12 tests
9. Higher-order function - ~12 tests

Total: Comprehensive test suite covering all testable functions

** Bugs Discovered and Fixed

1. *Input Validation Function*
   - Issue: Crashed on nil/null input
   - Fix: Added nil/type guards
   - Impact: Prevents crashes in validation code

2. *Resource Validation Function*
   - Issue: Crashed on nil, treated empty string as valid
   - Fix: Added guards for nil and empty string
   - Impact: More robust validation

3. *File/Data Parser Function* ⚠️ *SIGNIFICANT BUG*
   - Issue: Pattern matching wrong - URLs/URIs corrupted as relative paths
   - Fix: Corrected pattern matching logic
   - Impact: Remote resources now work correctly
   - *This bug would have corrupted user data in production*

** Code Quality Improvements

- All testable helper functions now have comprehensive test coverage
- More defensive error handling (nil guards)
- Clear separation of concerns (pure helpers vs interactive wrappers)
- Systematic boundary condition testing
- Unicode and special character handling verified

* Lessons Learned

** 1. Tests as Bug Discovery Tools

Tests aren't just for preventing regressions - they actively *discover existing bugs*:
- Pattern matching bugs may exist in production
- Nil/null handling bugs manifest in edge cases
- Tests make these issues visible immediately
- Bugs caught before users encounter them

** 2. Refactoring Enables Testing

The decision to split functions into pure helpers + interactive wrappers:
- Made testing dramatically simpler
- Enabled 100+ tests with zero mocking
- Improved code reusability
- Clarified function responsibilities

** 3. Systematic Process Matters

Following the same pattern for each function:
- Reduced cognitive load
- Made it easy to maintain consistency
- Enabled quick iteration
- Built confidence in coverage

** 4. File Organization Aids Debugging

One test file per function:
- Fast discovery when tests fail
- Clear ownership
- Easy to maintain
- Follows user's mental model

** 5. Test Quality Equals Production Quality

Quality tests:
- Use real resources (not mocks)
- Test actual behavior
- Cover edge cases systematically
- Find real bugs

This is only possible with well-factored, testable code.

* Applying These Principles

When adding tests to other modules:

1. *Identify testable functions* - Look for pure helpers, file I/O, string manipulation
2. *Refactor if needed* - Split interactive functions into pure helpers
3. *Write systematically* - Normal, Boundary, Error categories
4. *Run frequently* - Fast feedback loop
5. *Analyze failures carefully* - Test bug vs production bug
6. *Fix immediately* - Don't accumulate technical debt
7. *Maintain organization* - One file per function, clear naming

* Reference

See =ai-prompts/quality-engineer.org= for comprehensive quality engineering guidelines, including:
- Test organization and structure
- Test naming conventions
- Mocking and stubbing best practices
- Interactive vs non-interactive function patterns
- Integration testing guidelines
- Test maintenance strategies

Note: =quality-engineer.org= evolves as we learn more quality best practices. This document captures principles applied during this specific session.

* Conclusion

This session process demonstrates how systematic testing combined with refactoring for testability can:
- Discover real bugs before they reach users
- Improve code quality and robustness
- Build confidence in changes
- Create maintainable test suites
- Follow industry best practices

A comprehensive test suite with multiple bug fixes represents significant quality improvement to any module. Critical bugs (like the pattern matching issue in the example) alone can justify the entire testing effort - such bugs can cause data corruption and break major features.

*Testing is not just about preventing future bugs - it's about finding bugs that already exist.*