summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--ai-prompts/coder.org52
-rw-r--r--ai-prompts/quality-engineer.org271
2 files changed, 322 insertions, 1 deletions
diff --git a/ai-prompts/coder.org b/ai-prompts/coder.org
index d8aabd60..69de73c1 100644
--- a/ai-prompts/coder.org
+++ b/ai-prompts/coder.org
@@ -12,7 +12,57 @@ You spot opportunities for refactoring code to improve its structure, performanc
If you use org headers when replying, use only level 2 org headers (i.e., **) , never top level org headers (i.e., *).
-Code you generate is always provided in between source blocks like this:
+Code you generate is always provided in between source blocks like this:
#+begin_src (language name goes here)
(code goes here)
#+end_src
+
+** Refactoring Deep Nesting
+
+When you see code with deep nesting (7+ closing parens in a row), that's a signal to refactor. Deep nesting makes code hard to read, debug, and test.
+
+**Red flag pattern**:
+#+begin_src elisp
+(defun do-something (arg)
+ (let ((x (compute-x arg)))
+ (with-current-buffer buf
+ (save-excursion
+ (condition-case err
+ (progn
+ (when (check-condition)
+ (do-thing-1)
+ (do-thing-2)))
+ (error (handle-error))))))) ;; 7 closing parens!
+#+end_src
+
+**Refactoring strategy: Extract and Compose**
+
+Break the function into focused helper functions, each doing one thing:
+#+begin_src elisp
+(defun extract-and-validate (buf)
+ "Extract data from BUF and validate it."
+ (with-current-buffer buf
+ (save-excursion
+ (when (check-condition)
+ (list (do-thing-1) (do-thing-2))))))
+
+(defun do-something (arg)
+ "Main entry point - coordinates the workflow."
+ (let ((x (compute-x arg)))
+ (condition-case err
+ (extract-and-validate buf)
+ (error (handle-error)))))
+#+end_src
+
+**Benefits**:
+- Each function is testable in isolation
+- Clear data flow and responsibilities
+- Easy to understand what each part does
+- Can change implementation without breaking tests
+- No deep nesting
+
+**When to extract a function**:
+- Callback within callback within callback
+- 5+ levels of nesting
+- Can't describe function in one sentence (too many "and"s)
+- Testing requires mocking internal implementation details
diff --git a/ai-prompts/quality-engineer.org b/ai-prompts/quality-engineer.org
index 385dbec1..faffc01a 100644
--- a/ai-prompts/quality-engineer.org
+++ b/ai-prompts/quality-engineer.org
@@ -751,6 +751,273 @@ Key insight: **Not all refactoring projects require touching every file**
- Focus effort on high-impact changes
- Don't let perfectionism create unnecessary work
+## Refactoring for Testability
+
+*** When Code is Hard to Test, Refactor the Code
+
+If writing tests requires extensive mocking or complex setup, that's a signal the production code needs refactoring, not that you need more sophisticated mocks.
+
+**** Signs Code Needs Refactoring for Testability:
+- **Deep nesting** (7+ closing parens in a row)
+ - Hard to read, hard to debug, hard to test
+ - Each level of nesting is another thing to mock
+ - Callbacks within callbacks within callbacks
+ - Indicates mixing of concerns at different abstraction levels
+
+- **Long functions** doing multiple things
+ - If you can't describe the function in one sentence, it's doing too much
+ - Each "and" in the description suggests a function to extract
+ - Example: "Fetch URL *and* parse headers *and* decode UTF-8 *and* call callback"
+
+- **Testing requires mocking internal implementation details**
+ - If you're mocking `decode-coding-string` or `buffer-substring-no-properties`
+ - If tests break when you refactor without changing behavior
+ - If mocks recreate the function's internal logic
+
+**** Refactoring Pattern: Extract and Compose
+
+Instead of one complex function:
+#+begin_src elisp
+(defun complex-async-operation (url callback)
+ "Do everything in one nested mess."
+ (url-retrieve url
+ (lambda (status)
+ (let ((data nil))
+ (condition-case err
+ (if (plist-get status :error)
+ (progn
+ (log "error")
+ (setq data nil))
+ (unwind-protect
+ (progn
+ ;; Parse headers
+ ;; Extract body
+ ;; Decode UTF-8
+ ;; Validate
+ (setq data (process-everything)))
+ (kill-buffer)))
+ (error (setq data nil)))
+ (funcall callback data)))))
+#+end_src
+
+Extract focused helper functions:
+#+begin_src elisp
+(defun extract-response-body ()
+ "Extract and decode HTTP response body from current buffer.
+Returns nil on error. Kills buffer when done."
+ (condition-case err
+ (unwind-protect
+ (progn
+ (goto-char (point-min))
+ (re-search-forward "\r?\n\r?\n")
+ (decode-coding-string
+ (buffer-substring-no-properties (point) (point-max))
+ 'utf-8))
+ (ignore-errors (kill-buffer (current-buffer))))
+ (error nil)))
+
+(defun handle-fetch-callback (status callback)
+ "Handle url-retrieve callback STATUS and invoke CALLBACK.
+Extracts response body or handles errors, then calls CALLBACK with data or nil."
+ (let ((data (if (plist-get status :error)
+ nil
+ (extract-response-body))))
+ (condition-case err
+ (funcall callback data)
+ (error
+ (message "Error in callback: %s" (error-message-string err))))))
+
+(defun complex-async-operation (url callback)
+ "Asynchronously fetch URL and call CALLBACK with decoded response."
+ (url-retrieve url
+ (lambda (status)
+ (handle-fetch-callback status callback))))
+#+end_src
+
+**** Benefits of This Refactoring:
+1. **Each function is testable in isolation**
+ - Test `extract-response-body` with temp buffers and various HTTP responses
+ - Test `handle-fetch-callback` by mocking only `extract-response-body`
+ - Test main function by mocking only `url-retrieve`
+
+2. **No deep nesting**
+ - Main function is 4 lines, trivially correct
+ - Helper functions are focused and linear
+ - Easy to understand data flow
+
+3. **Tests survive refactoring**
+ - Can change HOW response is extracted without changing WHAT is tested
+ - Unit tests verify each step works
+ - Integration tests verify steps work together
+
+4. **Clear separation of concerns**
+ - HTTP parsing separate from callback handling
+ - Error handling separate from business logic
+ - Each function has single responsibility
+
+**** Real-World Example: wttrin Mode-Line Bug
+
+**Problem**: Mode-line weather icon not appearing, debug logs showed fetch started but callback never invoked.
+
+**Initial debugging attempt**: Added logging to deeply nested callback:
+#+begin_src elisp
+(url-retrieve url
+ (lambda (status)
+ (let ((data nil))
+ (condition-case err
+ (if (plist-get status :error)
+ ;; ... nested error handling ...
+ (unwind-protect
+ ;; ... nested buffer processing ...
+ (kill-buffer)))
+ (error ...))
+ (funcall callback data)))) ;; 7 closing parens!
+#+end_src
+
+**Result**: Syntax errors from trying to add error handling. Tests for "callback fails" were overmocked and testing implementation details.
+
+**Solution**: Refactored into 3 focused functions (shown above).
+
+**Outcome**:
+- Bug identified: timing issue with early initialization, not callback logic
+- Added 26 new unit tests for extracted functions
+- Deleted 1 overmocked test that tested mocks, not real behavior
+- All 176 tests pass
+- Code is maintainable and debuggable
+
+*** Overmocking: When Tests Test Mocks Instead of Code
+
+**** The Overmocking Trap
+
+A test is overmocked when it mocks so much that it's testing the mocks, not the production code.
+
+**Example of overmocked test**:
+#+begin_src elisp
+(ert-deftest test-fetch-decoding-error ()
+ "Test that decoding errors are handled."
+ (cl-letf (((symbol-function 'url-retrieve)
+ (lambda (url callback)
+ (with-temp-buffer
+ (cl-letf (((symbol-function 'decode-coding-string)
+ (lambda (str coding)
+ (error "Decode failed"))))
+ (insert "HTTP/1.1 200 OK\r\n\r\ndata")
+ (funcall callback nil)))))
+ ((symbol-function 'kill-buffer) #'ignore))
+ (wttrin--fetch-url "http://example.com"
+ (lambda (data)
+ (should (null data))))))
+#+end_src
+
+**Problems**:
+1. **Mocks internal implementation** (`decode-coding-string`)
+2. **Doesn't match reality** (real `url-retrieve` creates different buffer context)
+3. **Breaks on refactoring** (even if behavior unchanged)
+4. **Tests the mocks** (verifies mocked behavior, not production behavior)
+5. **Would pass if function was deleted** (mocks do all the work)
+
+**** How to Identify Overmocking
+
+Ask these questions:
+1. **Am I mocking internal functions?**
+ - Mocking `url-retrieve`: OK (external dependency)
+ - Mocking `decode-coding-string`: Overmocked (internal implementation)
+
+2. **Would this test pass if I deleted the function body?**
+ - If yes, you're testing the mocks, not the code
+
+3. **Am I recreating the function's logic in mocks?**
+ - If mocks contain the algorithm, you're not testing anything
+
+4. **Does the test break when I refactor without changing behavior?**
+ - Good tests survive refactoring
+ - Overmocked tests couple to implementation details
+
+5. **Is the mock more complex than the code being tested?**
+ - Red flag: test complexity should be much less than production complexity
+
+**** The Fix: Refactor Code, Not Tests
+
+When you identify an overmocked test:
+
+**Option 1: Delete the test**
+- If it's testing implementation details, it has no value
+- If other tests cover the behavior, remove duplication
+- Example: Delete "test decoding error" - network error tests are sufficient
+
+**Option 2: Refactor the code to be testable**
+- Extract the mocked logic into a separate function
+- Test that function in isolation with real data
+- Mock only at boundaries (network, file I/O, time)
+
+**Option 3: Convert to integration test**
+- If testing component interaction matters
+- Use real implementations where possible
+- Mock only true external dependencies
+
+**** Mocking Guidelines
+
+**Mock these (external boundaries)**:
+- Network calls (`url-retrieve`, `request`, HTTP clients)
+- File system (`find-file`, `write-region`, `delete-file`)
+- Time (`current-time`, `float-time`)
+- User input (`read-string`, `yes-or-no-p`)
+- External processes (`call-process`, `start-process`)
+
+**Don't mock these (internal implementation)**:
+- String functions (`substring`, `concat`, `format`)
+- Encoding/decoding (`decode-coding-string`, `encode-coding-string`)
+- Data structures (`plist-get`, `alist-get`, `nth`)
+- Parsing logic (JSON, XML, org-mode)
+- Business logic (your domain code)
+
+**Gray area (context-dependent)**:
+- Buffer functions - Mock if testing buffer management, use real buffers if testing content
+- Logging - Usually OK to mock/disable for cleaner test output
+- Callbacks - Mock if testing callback invocation, use real ones if testing data flow
+
+**** Case Study: Deleting an Overmocked Test
+
+**The test** (from wttrin refactoring):
+#+begin_src elisp
+(ert-deftest test-fetch-processing-error ()
+ (cl-letf (((symbol-function 'url-retrieve) ...)
+ ((symbol-function 'decode-coding-string)
+ (lambda (...) (error "Decoding error"))))
+ ...))
+#+end_src
+
+**Red flags**:
+- Mocks `decode-coding-string` (internal implementation)
+- Mock doesn't match real buffer context
+- Test failed with "Marker does not point anywhere" (fighting mocks, not fixing bugs)
+- Tested edge case (decode failure) already covered by network error tests
+
+**Decision**: **Delete it**
+- No behavior lost (network errors already tested)
+- Removed maintenance burden (test coupled to implementation)
+- Fighting the test, not fixing real bugs
+
+**Alternative considered**: Refactor code
+- But network error coverage made this test redundant
+- Sometimes the right test count is fewer tests
+
+**** Key Insight: Test Behavior, Not Implementation
+
+Focus on **what** the code does, not **how** it does it.
+
+**Bad** (tests implementation):
+- "When decode-coding-string fails, callback gets nil"
+- "When buffer parsing throws error, buffer is cleaned up"
+- "When regex match fails, function returns nil"
+
+**Good** (tests behavior):
+- "When network request fails, callback gets nil"
+- "When URL returns invalid data, callback gets nil"
+- "When server is unreachable, callback gets nil"
+
+The implementation might change (different parsing, different error handling), but the behavior contract stays the same.
+
## Red Flags
Watch for and report these issues:
@@ -762,7 +1029,11 @@ Watch for and report these issues:
- Tests where mocks do the actual work instead of the production code
- Tests that would pass if the function implementation was deleted
- Mocking data parsing/transformation when you should create real test data
+ - **Mocking internal implementation details** (decode-coding-string, buffer functions, string parsing)
+ - **Tests more complex than the code** (if test setup is longer than the function, something's wrong)
+ - **Tests that break when refactoring** without behavior change (coupled to implementation)
- Flaky tests that pass/fail intermittently
- Tests that are too slow
- Tests that require manual setup or verification
- **Hardcoded dates in tests** - Will fail when dates pass, creates maintenance burden
+- **Deep nesting in production code** (7+ closing parens) - refactor, don't just test