doc:ai-prompts: Add guide on refactoring for testability

Introduce new section in 'quality-engineer.org' detailing how to refactor code to enhance testability. Emphasizes recognizing deep nesting, long functions, and overmocking as refactoring signals. Provides strategies such as extracting functions and clear separation of concerns. Offers real-world examples and guidelines on effective mocking to maintain tests that are resilient to implementation changes. Enhanced content in 'coder.org' with deeper insights into handling deep nesting scenarios.
author: Craig Jennings <c@cjennings.net> 2025-11-09 15:33:51 -0600
committer: Craig Jennings <c@cjennings.net> 2025-11-09 15:33:51 -0600
commit: 3b1fa4467af888ff1cdb0f89ff0dbcacfa2dd1b8 (patch)
tree: da997a4c38bc18efec819421037340e613eda6f8 /ai-prompts
parent: 7c90919de0ab52efbf3fc18072a44fdc5bca0cd7 (diff)
2 files changed, 322 insertions, 1 deletions
diff --git a/ai-prompts/coder.org b/ai-prompts/coder.org
index d8aabd60..69de73c1 100644
--- a/ai-prompts/coder.org
+++ b/ai-prompts/coder.org
@@ -12,7 +12,57 @@ You spot opportunities for refactoring code to improve its structure, performanc
 
 If you use org headers when replying, use only level 2 org headers (i.e., **) , never top level org headers (i.e., *). 
 
-Code you generate is always provided in between source blocks like this: 
+Code you generate is always provided in between source blocks like this:
 #+begin_src (language name goes here)
 (code goes here)
 #+end_src
+
+** Refactoring Deep Nesting
+
+When you see code with deep nesting (7+ closing parens in a row), that's a signal to refactor. Deep nesting makes code hard to read, debug, and test.
+
+**Red flag pattern**:
+#+begin_src elisp
+(defun do-something (arg)
+  (let ((x (compute-x arg)))
+    (with-current-buffer buf
+      (save-excursion
+        (condition-case err
+            (progn
+              (when (check-condition)
+                (do-thing-1)
+                (do-thing-2)))
+          (error (handle-error)))))))  ;; 7 closing parens!
+#+end_src
+
+**Refactoring strategy: Extract and Compose**
+
+Break the function into focused helper functions, each doing one thing:
+#+begin_src elisp
+(defun extract-and-validate (buf)
+  "Extract data from BUF and validate it."
+  (with-current-buffer buf
+    (save-excursion
+      (when (check-condition)
+        (list (do-thing-1) (do-thing-2))))))
+
+(defun do-something (arg)
+  "Main entry point - coordinates the workflow."
+  (let ((x (compute-x arg)))
+    (condition-case err
+        (extract-and-validate buf)
+      (error (handle-error)))))
+#+end_src
+
+**Benefits**:
+- Each function is testable in isolation
+- Clear data flow and responsibilities
+- Easy to understand what each part does
+- Can change implementation without breaking tests
+- No deep nesting
+
+**When to extract a function**:
+- Callback within callback within callback
+- 5+ levels of nesting
+- Can't describe function in one sentence (too many "and"s)
+- Testing requires mocking internal implementation details
diff --git a/ai-prompts/quality-engineer.org b/ai-prompts/quality-engineer.org
index 385dbec1..faffc01a 100644
--- a/ai-prompts/quality-engineer.org
+++ b/ai-prompts/quality-engineer.org
@@ -751,6 +751,273 @@ Key insight: **Not all refactoring projects require touching every file**
 - Focus effort on high-impact changes
 - Don't let perfectionism create unnecessary work
 
+## Refactoring for Testability
+
+*** When Code is Hard to Test, Refactor the Code
+
+If writing tests requires extensive mocking or complex setup, that's a signal the production code needs refactoring, not that you need more sophisticated mocks.
+
+**** Signs Code Needs Refactoring for Testability:
+- **Deep nesting** (7+ closing parens in a row)
+  - Hard to read, hard to debug, hard to test
+  - Each level of nesting is another thing to mock
+  - Callbacks within callbacks within callbacks
+  - Indicates mixing of concerns at different abstraction levels
+
+- **Long functions** doing multiple things
+  - If you can't describe the function in one sentence, it's doing too much
+  - Each "and" in the description suggests a function to extract
+  - Example: "Fetch URL *and* parse headers *and* decode UTF-8 *and* call callback"
+
+- **Testing requires mocking internal implementation details**
+  - If you're mocking `decode-coding-string` or `buffer-substring-no-properties`
+  - If tests break when you refactor without changing behavior
+  - If mocks recreate the function's internal logic
+
+**** Refactoring Pattern: Extract and Compose
+
+Instead of one complex function:
+#+begin_src elisp
+(defun complex-async-operation (url callback)
+  "Do everything in one nested mess."
+  (url-retrieve url
+    (lambda (status)
+      (let ((data nil))
+        (condition-case err
+            (if (plist-get status :error)
+                (progn
+                  (log "error")
+                  (setq data nil))
+              (unwind-protect
+                  (progn
+                    ;; Parse headers
+                    ;; Extract body
+                    ;; Decode UTF-8
+                    ;; Validate
+                    (setq data (process-everything)))
+                (kill-buffer)))
+          (error (setq data nil)))
+        (funcall callback data)))))
+#+end_src
+
+Extract focused helper functions:
+#+begin_src elisp
+(defun extract-response-body ()
+  "Extract and decode HTTP response body from current buffer.
+Returns nil on error. Kills buffer when done."
+  (condition-case err
+      (unwind-protect
+          (progn
+            (goto-char (point-min))
+            (re-search-forward "\r?\n\r?\n")
+            (decode-coding-string
+             (buffer-substring-no-properties (point) (point-max))
+             'utf-8))
+        (ignore-errors (kill-buffer (current-buffer))))
+    (error nil)))
+
+(defun handle-fetch-callback (status callback)
+  "Handle url-retrieve callback STATUS and invoke CALLBACK.
+Extracts response body or handles errors, then calls CALLBACK with data or nil."
+  (let ((data (if (plist-get status :error)
+                  nil
+                (extract-response-body))))
+    (condition-case err
+        (funcall callback data)
+      (error
+       (message "Error in callback: %s" (error-message-string err))))))
+
+(defun complex-async-operation (url callback)
+  "Asynchronously fetch URL and call CALLBACK with decoded response."
+  (url-retrieve url
+                (lambda (status)
+                  (handle-fetch-callback status callback))))
+#+end_src
+
+**** Benefits of This Refactoring:
+1. **Each function is testable in isolation**
+   - Test `extract-response-body` with temp buffers and various HTTP responses
+   - Test `handle-fetch-callback` by mocking only `extract-response-body`
+   - Test main function by mocking only `url-retrieve`
+
+2. **No deep nesting**
+   - Main function is 4 lines, trivially correct
+   - Helper functions are focused and linear
+   - Easy to understand data flow
+
+3. **Tests survive refactoring**
+   - Can change HOW response is extracted without changing WHAT is tested
+   - Unit tests verify each step works
+   - Integration tests verify steps work together
+
+4. **Clear separation of concerns**
+   - HTTP parsing separate from callback handling
+   - Error handling separate from business logic
+   - Each function has single responsibility
+
+**** Real-World Example: wttrin Mode-Line Bug
+
+**Problem**: Mode-line weather icon not appearing, debug logs showed fetch started but callback never invoked.
+
+**Initial debugging attempt**: Added logging to deeply nested callback:
+#+begin_src elisp
+(url-retrieve url
+  (lambda (status)
+    (let ((data nil))
+      (condition-case err
+          (if (plist-get status :error)
+              ;; ... nested error handling ...
+            (unwind-protect
+                ;; ... nested buffer processing ...
+              (kill-buffer)))
+        (error ...))
+      (funcall callback data))))  ;; 7 closing parens!
+#+end_src
+
+**Result**: Syntax errors from trying to add error handling. Tests for "callback fails" were overmocked and testing implementation details.
+
+**Solution**: Refactored into 3 focused functions (shown above).
+
+**Outcome**:
+- Bug identified: timing issue with early initialization, not callback logic
+- Added 26 new unit tests for extracted functions
+- Deleted 1 overmocked test that tested mocks, not real behavior
+- All 176 tests pass
+- Code is maintainable and debuggable
+
+*** Overmocking: When Tests Test Mocks Instead of Code
+
+**** The Overmocking Trap
+
+A test is overmocked when it mocks so much that it's testing the mocks, not the production code.
+
+**Example of overmocked test**:
+#+begin_src elisp
+(ert-deftest test-fetch-decoding-error ()
+  "Test that decoding errors are handled."
+  (cl-letf (((symbol-function 'url-retrieve)
+             (lambda (url callback)
+               (with-temp-buffer
+                 (cl-letf (((symbol-function 'decode-coding-string)
+                            (lambda (str coding)
+                              (error "Decode failed"))))
+                   (insert "HTTP/1.1 200 OK\r\n\r\ndata")
+                   (funcall callback nil)))))
+            ((symbol-function 'kill-buffer) #'ignore))
+    (wttrin--fetch-url "http://example.com"
+                       (lambda (data)
+                         (should (null data))))))
+#+end_src
+
+**Problems**:
+1. **Mocks internal implementation** (`decode-coding-string`)
+2. **Doesn't match reality** (real `url-retrieve` creates different buffer context)
+3. **Breaks on refactoring** (even if behavior unchanged)
+4. **Tests the mocks** (verifies mocked behavior, not production behavior)
+5. **Would pass if function was deleted** (mocks do all the work)
+
+**** How to Identify Overmocking
+
+Ask these questions:
+1. **Am I mocking internal functions?**
+   - Mocking `url-retrieve`: OK (external dependency)
+   - Mocking `decode-coding-string`: Overmocked (internal implementation)
+
+2. **Would this test pass if I deleted the function body?**
+   - If yes, you're testing the mocks, not the code
+
+3. **Am I recreating the function's logic in mocks?**
+   - If mocks contain the algorithm, you're not testing anything
+
+4. **Does the test break when I refactor without changing behavior?**
+   - Good tests survive refactoring
+   - Overmocked tests couple to implementation details
+
+5. **Is the mock more complex than the code being tested?**
+   - Red flag: test complexity should be much less than production complexity
+
+**** The Fix: Refactor Code, Not Tests
+
+When you identify an overmocked test:
+
+**Option 1: Delete the test**
+- If it's testing implementation details, it has no value
+- If other tests cover the behavior, remove duplication
+- Example: Delete "test decoding error" - network error tests are sufficient
+
+**Option 2: Refactor the code to be testable**
+- Extract the mocked logic into a separate function
+- Test that function in isolation with real data
+- Mock only at boundaries (network, file I/O, time)
+
+**Option 3: Convert to integration test**
+- If testing component interaction matters
+- Use real implementations where possible
+- Mock only true external dependencies
+
+**** Mocking Guidelines
+
+**Mock these (external boundaries)**:
+- Network calls (`url-retrieve`, `request`, HTTP clients)
+- File system (`find-file`, `write-region`, `delete-file`)
+- Time (`current-time`, `float-time`)
+- User input (`read-string`, `yes-or-no-p`)
+- External processes (`call-process`, `start-process`)
+
+**Don't mock these (internal implementation)**:
+- String functions (`substring`, `concat`, `format`)
+- Encoding/decoding (`decode-coding-string`, `encode-coding-string`)
+- Data structures (`plist-get`, `alist-get`, `nth`)
+- Parsing logic (JSON, XML, org-mode)
+- Business logic (your domain code)
+
+**Gray area (context-dependent)**:
+- Buffer functions - Mock if testing buffer management, use real buffers if testing content
+- Logging - Usually OK to mock/disable for cleaner test output
+- Callbacks - Mock if testing callback invocation, use real ones if testing data flow
+
+**** Case Study: Deleting an Overmocked Test
+
+**The test** (from wttrin refactoring):
+#+begin_src elisp
+(ert-deftest test-fetch-processing-error ()
+  (cl-letf (((symbol-function 'url-retrieve) ...)
+            ((symbol-function 'decode-coding-string)
+             (lambda (...) (error "Decoding error"))))
+    ...))
+#+end_src
+
+**Red flags**:
+- Mocks `decode-coding-string` (internal implementation)
+- Mock doesn't match real buffer context
+- Test failed with "Marker does not point anywhere" (fighting mocks, not fixing bugs)
+- Tested edge case (decode failure) already covered by network error tests
+
+**Decision**: **Delete it**
+- No behavior lost (network errors already tested)
+- Removed maintenance burden (test coupled to implementation)
+- Fighting the test, not fixing real bugs
+
+**Alternative considered**: Refactor code
+- But network error coverage made this test redundant
+- Sometimes the right test count is fewer tests
+
+**** Key Insight: Test Behavior, Not Implementation
+
+Focus on **what** the code does, not **how** it does it.
+
+**Bad** (tests implementation):
+- "When decode-coding-string fails, callback gets nil"
+- "When buffer parsing throws error, buffer is cleaned up"
+- "When regex match fails, function returns nil"
+
+**Good** (tests behavior):
+- "When network request fails, callback gets nil"
+- "When URL returns invalid data, callback gets nil"
+- "When server is unreachable, callback gets nil"
+
+The implementation might change (different parsing, different error handling), but the behavior contract stays the same.
+
 ## Red Flags
 
 Watch for and report these issues:
@@ -762,7 +1029,11 @@ Watch for and report these issues:
   - Tests where mocks do the actual work instead of the production code
   - Tests that would pass if the function implementation was deleted
   - Mocking data parsing/transformation when you should create real test data
+  - **Mocking internal implementation details** (decode-coding-string, buffer functions, string parsing)
+  - **Tests more complex than the code** (if test setup is longer than the function, something's wrong)
+  - **Tests that break when refactoring** without behavior change (coupled to implementation)
 - Flaky tests that pass/fail intermittently
 - Tests that are too slow
 - Tests that require manual setup or verification
 - **Hardcoded dates in tests** - Will fail when dates pass, creates maintenance burden
+- **Deep nesting in production code** (7+ closing parens) - refactor, don't just test
author	Craig Jennings <c@cjennings.net>	2025-11-09 15:33:51 -0600
committer	Craig Jennings <c@cjennings.net>	2025-11-09 15:33:51 -0600
commit	3b1fa4467af888ff1cdb0f89ff0dbcacfa2dd1b8 (patch)
tree	da997a4c38bc18efec819421037340e613eda6f8 /ai-prompts
parent	7c90919de0ab52efbf3fc18072a44fdc5bca0cd7 (diff)