#+TITLE: Test-Driven Quality Engineering Session Process
#+AUTHOR: Craig Jennings & Claude
#+DATE: 2025-11-01

* Overview

This document describes a comprehensive test-driven quality engineering session process applicable to any source code module. The session demonstrates systematic testing practices, refactoring for testability, bug discovery through tests, and decision-making processes when tests fail.

* Session Goals

1. Add comprehensive unit test coverage for testable functions in your module
2. Discover and fix bugs through systematic testing
3. Follow quality engineering principles from =ai-prompts/quality-engineer.org=
4. Demonstrate refactoring patterns for testability
5. Document the decision-making process for test vs production code issues

* Phase 1: Feature Addition with Testability in Mind

** The Feature Request

Add new functionality that requires user interaction combined with business logic.

Example requirements:
- Present user with options (e.g., interactive selection)
- Allow cancellation
- Perform an operation with the selected input
- Provide clear success/failure feedback

** Refactoring for Testability

Following the "Interactive vs Non-Interactive Function Pattern" from =quality-engineer.org=:

*Problem:* Directly implementing as an interactive function would require:
- Mocking user interface components
- Mocking framework-specific APIs
- Testing UI functionality, not core business logic

*Solution:* Split into two functions:

1. *Helper Function* (internal implementation):
   - Pure, deterministic
   - Takes explicit parameters
   - No user interaction
   - Returns values or signals errors naturally
   - 100% testable, no mocking needed

2. *Interactive Wrapper* (public interface):
   - Thin layer handling only user interaction
   - Gets input from user/context
   - Presents UI (prompts, selections, etc.)
   - Catches errors and displays messages
   - Delegates all business logic to helper
   - No tests needed (just testing framework UI)

** Benefits of This Pattern

From =quality-engineer.org=:
#+begin_quote
When writing functions that combine business logic with user interaction:
- Split into internal implementation and interactive wrapper
- Internal function: Pure logic, takes all parameters explicitly
- Dramatically simpler testing (no interactive mocking)
- Code reusable programmatically without prompts
- Clear separation of concerns (logic vs UI)
#+end_quote

This pattern enables:
- Zero mocking in tests
- Fast, deterministic tests
- Easy reasoning about correctness
- Reusable helper function

* Phase 2: Writing the First Test

** Test File Naming

Following the naming convention from =quality-engineer.org=:
- Pattern: =test-<module>-<function>.<ext>=
- One test file per function for easy discovery when tests fail
- Developer sees failure → immediately knows which file to open

** Test Organization

Following the three-category structure:

*** Normal Cases
- Standard expected inputs
- Common use case scenarios
- Happy path operations
- Multiple operations in sequence

*** Boundary Cases
- Very long inputs
- Unicode characters (中文, emoji)
- Special characters and edge cases
- Empty or minimal data
- Maximum values

*** Error Cases
- Invalid inputs
- Nonexistent resources
- Permission denied scenarios
- Wrong type of input

** Writing Tests with Zero Mocking

Key principle: "Don't mock what you're testing" (from =quality-engineer.org=)

Example test structure:
#+begin_src
test_function_normal_case_expected_result()
  setup()
  try:
      # Arrange
      input_data = create_test_data()
      expected_output = define_expected_result()

      # Act
      actual_output = function_under_test(input_data)

      # Assert
      assert actual_output == expected_output
  finally:
      teardown()
#+end_src

Key characteristics:
- No mocks for the function being tested
- Real resources (files, data structures) using test utilities
- Tests actual function behavior
- Clean setup/teardown
- Clear arrange-act-assert structure

** Result

When helper functions are well-factored and deterministic, tests often pass on first run.

* Phase 3: Systematic Test Coverage Analysis

** Identifying Testable Functions

Review all functions in your module and categorize by testability:

*** Easy to Test (Pure/Deterministic)
- Input validation functions
- String manipulation/formatting
- Data structure transformations
- File parsing (read-only operations)
- Configuration/option processing

*** Medium Complexity (Need External Resources)
- File I/O operations
- Recursive algorithms
- Data structure generation
- Cache or state management

*** Hard to Test (Framework/Context Dependencies)
- Functions requiring specific runtime environment
- UI/buffer/window management
- Functions tightly coupled to framework internals
- Functions requiring complex mocking setup

*Decision:* Test easy and medium complexity functions. Skip framework-dependent functions that would require extensive mocking/setup (diminishing returns).

** File Organization Principle

From =quality-engineer.org=:
#+begin_quote
*Unit Tests*: One file per method
- Naming: =test-<filename>-<methodname>.<ext>=
- Example: =test-module--function.ext=
#+end_quote

*Rationale:* When a test fails in CI:
1. Developer sees: =test-module--function-normal-case-returns-result FAILED=
2. Immediately knows: Look for =test-module--function.<ext>=
3. Opens file and fixes issue - *fast cognitive path*

If combined files:
1. Test fails: =test-module--function-normal-case-returns-result FAILED=
2. Which file? =test-module--helpers.<ext>=? =test-module--combined.<ext>=?
3. Developer wastes time searching - *slower, frustrating*

*The 1:1 mapping is a usability feature for developers under pressure.*

* Phase 4: Testing Function by Function

** Example 1: Input Validation Function

*** Test Categories

*Normal Cases:*
- Valid inputs
- Case variations
- Common use cases

*Boundary Cases:*
- Edge cases in input format
- Multiple delimiters or separators
- Empty or minimal input
- Very long input

*Error Cases:*
- Nil/null input
- Wrong type
- Malformed input

*** First Run: Most Passed, Some FAILED

*Example Failure:*
#+begin_src
test-module--validate-input-error-nil-input-returns-nil
Expected: Returns nil gracefully
Actual: (TypeError/NullPointerException) - CRASHED
#+end_src

*** Bug Analysis: Test or Production Code?

*Process:*
1. Read the test expectation: "nil input returns nil/false gracefully"
2. Read the production code:
   #+begin_src
   function validate_input(input):
       extension = get_extension(input)  # ← Crashes here on nil/null
       return extension in valid_extensions
   #+end_src
3. Identify issue: Function expects string, crashes on nil/null
4. Consider context: This is defensive validation code, called in various contexts

*Decision: Fix production code*

*Rationale:*
- Function should be defensive (validation code)
- Returning false/nil for invalid input is more robust than crashing
- Common pattern in validation functions
- Better user experience

*Fix:*
#+begin_src
function validate_input(input):
    if input is None or not isinstance(input, str):  # ← Guard added
        return False
    extension = get_extension(input)
    return extension in valid_extensions
#+end_src

Result: All tests pass after adding defensive checks.

** Example 2: Another Validation Function

*** First Run: Most Passed, Multiple FAILED

*Failures:*
1. Nil input crashed (same pattern as previous function)
2. Empty string returned unexpected value (edge case not handled)

*Fix:*
#+begin_src
function validate_resource(resource):
    # Guards added for nil/null and empty string
    if not resource or not isinstance(resource, str) or resource.strip() == "":
        return False

    # Original validation logic
    return is_valid_resource(resource) and meets_criteria(resource)
#+end_src

Result: All tests pass after adding comprehensive guards.

** Example 3: String Sanitization Function

*** First Run: Most Passed, 1 FAILED

*Failure:*
#+begin_src
test-module--sanitize-boundary-special-chars-replaced
Expected: "output__________" (10 underscores)
Actual: "output_________" (9 underscores)
#+end_src

*** Bug Analysis: Test or Production Code?

*Process:*
1. Count special chars in test input: 9 characters
2. Test expected 10 replacements, but input only has 9
3. Production code is working correctly

*Decision: Fix test code*

*The bug was in the test expectation, not the implementation.*

Result: All tests pass after correcting test expectations.

** Example 4: File/Data Parser Function

This is where a **significant bug** was discovered through testing!

*** Test Categories

*Normal Cases:*
- Absolute paths/references
- Relative paths (expanded to base directory)
- URLs/URIs preserved as-is
- Mixed types of references

*Boundary Cases:*
- Empty lines ignored
- Whitespace-only lines ignored
- Comments ignored (format-specific)
- Leading/trailing whitespace trimmed
- Order preserved

*Error Cases:*
- Nonexistent file
- Nil/null input

*** First Run: Majority Passed, Multiple FAILED

All failures related to URL/URI handling:

*Failure Pattern:*
#+begin_src
Expected: "http://example.com/resource"
Actual: "/base/path/http:/example.com/resource"
#+end_src

URLs were being treated as relative paths and corrupted!

*** Root Cause Analysis

*Production code:*
#+begin_src
if line.matches("^\(https?|mms\)://"):  # Pattern detection
    # Handle as URL
#+end_src

*Problem:* Pattern matching is incorrect!

The pattern/regex has an error:
- Incorrect escaping or syntax
- Pattern fails to match valid URLs
- All URLs fall through to the "relative path" handler

The pattern never matched, so URLs were incorrectly processed as relative paths.

*Correct version:*
#+begin_src
if line.matches("^(https?|mms)://"):  # Fixed pattern
    # Handle as URL
#+end_src

Common causes of this type of bug:
- String escaping issues in the language
- Incorrect regex syntax
- Copy-paste errors in patterns

*** Impact Assessment

*This is a significant bug:*
- Remote resources (URLs) would be broken
- Data corruption: URLs transformed into invalid paths
- Function worked for local/simple cases, so bug went unnoticed
- Users would see mysterious errors when using remote resources
- Potential data loss or corruption in production

*Tests caught a real production bug that could have caused user data corruption!*

Result: All tests pass after fixing the pattern matching logic.

* Phase 5: Continuing Through the Test Suite

** Additional Functions Tested Successfully

As testing continues through the module, patterns emerge:

*Function: Directory/File Listing*
   - Learning: Directory listing order may be filesystem-dependent
   - Solution: Sort results before comparing in tests

*Function: Data Extraction*
   - Keep as separate test file (don't combine with related functions)
   - Reason: Usability when tests fail

*Function: Recursive Operations*
   - Medium complexity: Required creating test data structures/trees
   - Use test utilities for setup/teardown
   - Well-factored functions often pass all tests initially

*Function: Higher-Order Functions*
   - Test functions that return functions/callbacks
   - Initially may misunderstand framework/protocol behavior
   - Fix test expectations to match actual framework behavior

* Key Principles Applied

** 1. Refactor for Testability BEFORE Writing Tests

The Interactive vs Non-Interactive pattern from =quality-engineer.org= made testing trivial:
- No mocking required
- Fast, deterministic tests
- Clear separation of concerns

** 2. Systematic Test Organization

Every test file followed the same structure:
- Normal Cases
- Boundary Cases
- Error Cases

This makes it easy to:
- Identify coverage gaps
- Add new tests
- Understand what's being tested

** 3. Test Naming Convention

Pattern: =test-<module>-<function>-<category>-<scenario>-<expected-result>=

Examples:
- =test-module--validate-input-normal-valid-extension-returns-true=
- =test-module--parse-data-boundary-empty-lines-ignored=
- =test-module--sanitize-error-nil-input-signals-error=

Benefits:
- Self-documenting
- Easy to understand what failed
- Searchable/grepable
- Clear category organization

** 4. Zero Mocking for Pure Functions

From =quality-engineer.org=:
#+begin_quote
DON'T MOCK WHAT YOU'RE TESTING
- Only mock external side-effects and dependencies, not the domain logic itself
- If mocking removes the actual work the function performs, you're testing the mock
- Use real data structures that the function is designed to operate on
- Rule of thumb: If the function body could be =(error "not implemented")= and tests still pass, you've over-mocked
#+end_quote

Our tests used:
- Real file I/O
- Real strings
- Real data structures
- Actual function behavior

Result: Tests caught real bugs, not mock configuration issues.

** 5. Test vs Production Code Bug Decision Framework

When a test fails, ask:

1. *What is the test expecting?*
   - Read the test name and assertions
   - Understand the intended behavior

2. *What is the production code doing?*
   - Read the implementation
   - Trace through the logic

3. *Which is correct?*
   - Is the test expectation reasonable?
   - Is the production behavior defensive/robust?
   - What is the usage context?

4. *Consider the impact:*
   - Defensive code: Fix production to handle edge cases
   - Wrong expectation: Fix test
   - Unclear spec: Ask user for clarification

Examples from our session:
- *Nil input crashes* → Fix production (defensive coding)
- *Empty string treated as valid* → Fix production (defensive coding)
- *Wrong count in test* → Fix test (test bug)
- *Regex escaping wrong* → Fix production (real bug!)

** 6. Fast Feedback Loop

Pattern: "Write tests, run them all, report errors, and see where we are!"

This became a mantra during the session:
1. Write comprehensive tests for one function
2. Run immediately
3. Analyze failures
4. Fix bugs (test or production)
5. Verify all tests pass
6. Move to next function

Benefits:
- Caught bugs immediately
- Small iteration cycles
- Clear progress
- High confidence in changes

* Final Results

** Test Coverage Example

*Multiple functions tested with comprehensive coverage:*
1. File operation helper - ~10-15 tests
2. Input validation function - ~15 tests
3. Resource validation function - ~13 tests
4. String sanitization function - ~13 tests
5. File/data parser function - ~15 tests
6. Directory listing function - ~7 tests
7. Data extraction function - ~6 tests
8. Recursive operation function - ~12 tests
9. Higher-order function - ~12 tests

Total: Comprehensive test suite covering all testable functions

** Bugs Discovered and Fixed

1. *Input Validation Function*
   - Issue: Crashed on nil/null input
   - Fix: Added nil/type guards
   - Impact: Prevents crashes in validation code

2. *Resource Validation Function*
   - Issue: Crashed on nil, treated empty string as valid
   - Fix: Added guards for nil and empty string
   - Impact: More robust validation

3. *File/Data Parser Function* ⚠️ *SIGNIFICANT BUG*
   - Issue: Pattern matching wrong - URLs/URIs corrupted as relative paths
   - Fix: Corrected pattern matching logic
   - Impact: Remote resources now work correctly
   - *This bug would have corrupted user data in production*

** Code Quality Improvements

- All testable helper functions now have comprehensive test coverage
- More defensive error handling (nil guards)
- Clear separation of concerns (pure helpers vs interactive wrappers)
- Systematic boundary condition testing
- Unicode and special character handling verified

* Lessons Learned

** 1. Tests as Bug Discovery Tools

Tests aren't just for preventing regressions - they actively *discover existing bugs*:
- Pattern matching bugs may exist in production
- Nil/null handling bugs manifest in edge cases
- Tests make these issues visible immediately
- Bugs caught before users encounter them

** 2. Refactoring Enables Testing

The decision to split functions into pure helpers + interactive wrappers:
- Made testing dramatically simpler
- Enabled 100+ tests with zero mocking
- Improved code reusability
- Clarified function responsibilities

** 3. Systematic Process Matters

Following the same pattern for each function:
- Reduced cognitive load
- Made it easy to maintain consistency
- Enabled quick iteration
- Built confidence in coverage

** 4. File Organization Aids Debugging

One test file per function:
- Fast discovery when tests fail
- Clear ownership
- Easy to maintain
- Follows user's mental model

** 5. Test Quality Equals Production Quality

Quality tests:
- Use real resources (not mocks)
- Test actual behavior
- Cover edge cases systematically
- Find real bugs

This is only possible with well-factored, testable code.

* Applying These Principles

When adding tests to other modules:

1. *Identify testable functions* - Look for pure helpers, file I/O, string manipulation
2. *Refactor if needed* - Split interactive functions into pure helpers
3. *Write systematically* - Normal, Boundary, Error categories
4. *Run frequently* - Fast feedback loop
5. *Analyze failures carefully* - Test bug vs production bug
6. *Fix immediately* - Don't accumulate technical debt
7. *Maintain organization* - One file per function, clear naming

* Reference

See =ai-prompts/quality-engineer.org= for comprehensive quality engineering guidelines, including:
- Test organization and structure
- Test naming conventions
- Mocking and stubbing best practices
- Interactive vs non-interactive function patterns
- Integration testing guidelines
- Test maintenance strategies

Note: =quality-engineer.org= evolves as we learn more quality best practices. This document captures principles applied during this specific session.

* Conclusion

This session process demonstrates how systematic testing combined with refactoring for testability can:
- Discover real bugs before they reach users
- Improve code quality and robustness
- Build confidence in changes
- Create maintainable test suites
- Follow industry best practices

A comprehensive test suite with multiple bug fixes represents significant quality improvement to any module. Critical bugs (like the pattern matching issue in the example) alone can justify the entire testing effort - such bugs can cause data corruption and break major features.

*Testing is not just about preventing future bugs - it's about finding bugs that already exist.*