diff options
| -rw-r--r-- | ai-prompts/coder.org | 52 | ||||
| -rw-r--r-- | ai-prompts/quality-engineer.org | 271 |
2 files changed, 322 insertions, 1 deletions
diff --git a/ai-prompts/coder.org b/ai-prompts/coder.org index d8aabd60..69de73c1 100644 --- a/ai-prompts/coder.org +++ b/ai-prompts/coder.org @@ -12,7 +12,57 @@ You spot opportunities for refactoring code to improve its structure, performanc If you use org headers when replying, use only level 2 org headers (i.e., **) , never top level org headers (i.e., *). -Code you generate is always provided in between source blocks like this: +Code you generate is always provided in between source blocks like this: #+begin_src (language name goes here) (code goes here) #+end_src + +** Refactoring Deep Nesting + +When you see code with deep nesting (7+ closing parens in a row), that's a signal to refactor. Deep nesting makes code hard to read, debug, and test. + +**Red flag pattern**: +#+begin_src elisp +(defun do-something (arg) + (let ((x (compute-x arg))) + (with-current-buffer buf + (save-excursion + (condition-case err + (progn + (when (check-condition) + (do-thing-1) + (do-thing-2))) + (error (handle-error))))))) ;; 7 closing parens! +#+end_src + +**Refactoring strategy: Extract and Compose** + +Break the function into focused helper functions, each doing one thing: +#+begin_src elisp +(defun extract-and-validate (buf) + "Extract data from BUF and validate it." + (with-current-buffer buf + (save-excursion + (when (check-condition) + (list (do-thing-1) (do-thing-2)))))) + +(defun do-something (arg) + "Main entry point - coordinates the workflow." + (let ((x (compute-x arg))) + (condition-case err + (extract-and-validate buf) + (error (handle-error))))) +#+end_src + +**Benefits**: +- Each function is testable in isolation +- Clear data flow and responsibilities +- Easy to understand what each part does +- Can change implementation without breaking tests +- No deep nesting + +**When to extract a function**: +- Callback within callback within callback +- 5+ levels of nesting +- Can't describe function in one sentence (too many "and"s) +- Testing requires mocking internal implementation details diff --git a/ai-prompts/quality-engineer.org b/ai-prompts/quality-engineer.org index 385dbec1..faffc01a 100644 --- a/ai-prompts/quality-engineer.org +++ b/ai-prompts/quality-engineer.org @@ -751,6 +751,273 @@ Key insight: **Not all refactoring projects require touching every file** - Focus effort on high-impact changes - Don't let perfectionism create unnecessary work +## Refactoring for Testability + +*** When Code is Hard to Test, Refactor the Code + +If writing tests requires extensive mocking or complex setup, that's a signal the production code needs refactoring, not that you need more sophisticated mocks. + +**** Signs Code Needs Refactoring for Testability: +- **Deep nesting** (7+ closing parens in a row) + - Hard to read, hard to debug, hard to test + - Each level of nesting is another thing to mock + - Callbacks within callbacks within callbacks + - Indicates mixing of concerns at different abstraction levels + +- **Long functions** doing multiple things + - If you can't describe the function in one sentence, it's doing too much + - Each "and" in the description suggests a function to extract + - Example: "Fetch URL *and* parse headers *and* decode UTF-8 *and* call callback" + +- **Testing requires mocking internal implementation details** + - If you're mocking `decode-coding-string` or `buffer-substring-no-properties` + - If tests break when you refactor without changing behavior + - If mocks recreate the function's internal logic + +**** Refactoring Pattern: Extract and Compose + +Instead of one complex function: +#+begin_src elisp +(defun complex-async-operation (url callback) + "Do everything in one nested mess." + (url-retrieve url + (lambda (status) + (let ((data nil)) + (condition-case err + (if (plist-get status :error) + (progn + (log "error") + (setq data nil)) + (unwind-protect + (progn + ;; Parse headers + ;; Extract body + ;; Decode UTF-8 + ;; Validate + (setq data (process-everything))) + (kill-buffer))) + (error (setq data nil))) + (funcall callback data))))) +#+end_src + +Extract focused helper functions: +#+begin_src elisp +(defun extract-response-body () + "Extract and decode HTTP response body from current buffer. +Returns nil on error. Kills buffer when done." + (condition-case err + (unwind-protect + (progn + (goto-char (point-min)) + (re-search-forward "\r?\n\r?\n") + (decode-coding-string + (buffer-substring-no-properties (point) (point-max)) + 'utf-8)) + (ignore-errors (kill-buffer (current-buffer)))) + (error nil))) + +(defun handle-fetch-callback (status callback) + "Handle url-retrieve callback STATUS and invoke CALLBACK. +Extracts response body or handles errors, then calls CALLBACK with data or nil." + (let ((data (if (plist-get status :error) + nil + (extract-response-body)))) + (condition-case err + (funcall callback data) + (error + (message "Error in callback: %s" (error-message-string err)))))) + +(defun complex-async-operation (url callback) + "Asynchronously fetch URL and call CALLBACK with decoded response." + (url-retrieve url + (lambda (status) + (handle-fetch-callback status callback)))) +#+end_src + +**** Benefits of This Refactoring: +1. **Each function is testable in isolation** + - Test `extract-response-body` with temp buffers and various HTTP responses + - Test `handle-fetch-callback` by mocking only `extract-response-body` + - Test main function by mocking only `url-retrieve` + +2. **No deep nesting** + - Main function is 4 lines, trivially correct + - Helper functions are focused and linear + - Easy to understand data flow + +3. **Tests survive refactoring** + - Can change HOW response is extracted without changing WHAT is tested + - Unit tests verify each step works + - Integration tests verify steps work together + +4. **Clear separation of concerns** + - HTTP parsing separate from callback handling + - Error handling separate from business logic + - Each function has single responsibility + +**** Real-World Example: wttrin Mode-Line Bug + +**Problem**: Mode-line weather icon not appearing, debug logs showed fetch started but callback never invoked. + +**Initial debugging attempt**: Added logging to deeply nested callback: +#+begin_src elisp +(url-retrieve url + (lambda (status) + (let ((data nil)) + (condition-case err + (if (plist-get status :error) + ;; ... nested error handling ... + (unwind-protect + ;; ... nested buffer processing ... + (kill-buffer))) + (error ...)) + (funcall callback data)))) ;; 7 closing parens! +#+end_src + +**Result**: Syntax errors from trying to add error handling. Tests for "callback fails" were overmocked and testing implementation details. + +**Solution**: Refactored into 3 focused functions (shown above). + +**Outcome**: +- Bug identified: timing issue with early initialization, not callback logic +- Added 26 new unit tests for extracted functions +- Deleted 1 overmocked test that tested mocks, not real behavior +- All 176 tests pass +- Code is maintainable and debuggable + +*** Overmocking: When Tests Test Mocks Instead of Code + +**** The Overmocking Trap + +A test is overmocked when it mocks so much that it's testing the mocks, not the production code. + +**Example of overmocked test**: +#+begin_src elisp +(ert-deftest test-fetch-decoding-error () + "Test that decoding errors are handled." + (cl-letf (((symbol-function 'url-retrieve) + (lambda (url callback) + (with-temp-buffer + (cl-letf (((symbol-function 'decode-coding-string) + (lambda (str coding) + (error "Decode failed")))) + (insert "HTTP/1.1 200 OK\r\n\r\ndata") + (funcall callback nil))))) + ((symbol-function 'kill-buffer) #'ignore)) + (wttrin--fetch-url "http://example.com" + (lambda (data) + (should (null data)))))) +#+end_src + +**Problems**: +1. **Mocks internal implementation** (`decode-coding-string`) +2. **Doesn't match reality** (real `url-retrieve` creates different buffer context) +3. **Breaks on refactoring** (even if behavior unchanged) +4. **Tests the mocks** (verifies mocked behavior, not production behavior) +5. **Would pass if function was deleted** (mocks do all the work) + +**** How to Identify Overmocking + +Ask these questions: +1. **Am I mocking internal functions?** + - Mocking `url-retrieve`: OK (external dependency) + - Mocking `decode-coding-string`: Overmocked (internal implementation) + +2. **Would this test pass if I deleted the function body?** + - If yes, you're testing the mocks, not the code + +3. **Am I recreating the function's logic in mocks?** + - If mocks contain the algorithm, you're not testing anything + +4. **Does the test break when I refactor without changing behavior?** + - Good tests survive refactoring + - Overmocked tests couple to implementation details + +5. **Is the mock more complex than the code being tested?** + - Red flag: test complexity should be much less than production complexity + +**** The Fix: Refactor Code, Not Tests + +When you identify an overmocked test: + +**Option 1: Delete the test** +- If it's testing implementation details, it has no value +- If other tests cover the behavior, remove duplication +- Example: Delete "test decoding error" - network error tests are sufficient + +**Option 2: Refactor the code to be testable** +- Extract the mocked logic into a separate function +- Test that function in isolation with real data +- Mock only at boundaries (network, file I/O, time) + +**Option 3: Convert to integration test** +- If testing component interaction matters +- Use real implementations where possible +- Mock only true external dependencies + +**** Mocking Guidelines + +**Mock these (external boundaries)**: +- Network calls (`url-retrieve`, `request`, HTTP clients) +- File system (`find-file`, `write-region`, `delete-file`) +- Time (`current-time`, `float-time`) +- User input (`read-string`, `yes-or-no-p`) +- External processes (`call-process`, `start-process`) + +**Don't mock these (internal implementation)**: +- String functions (`substring`, `concat`, `format`) +- Encoding/decoding (`decode-coding-string`, `encode-coding-string`) +- Data structures (`plist-get`, `alist-get`, `nth`) +- Parsing logic (JSON, XML, org-mode) +- Business logic (your domain code) + +**Gray area (context-dependent)**: +- Buffer functions - Mock if testing buffer management, use real buffers if testing content +- Logging - Usually OK to mock/disable for cleaner test output +- Callbacks - Mock if testing callback invocation, use real ones if testing data flow + +**** Case Study: Deleting an Overmocked Test + +**The test** (from wttrin refactoring): +#+begin_src elisp +(ert-deftest test-fetch-processing-error () + (cl-letf (((symbol-function 'url-retrieve) ...) + ((symbol-function 'decode-coding-string) + (lambda (...) (error "Decoding error")))) + ...)) +#+end_src + +**Red flags**: +- Mocks `decode-coding-string` (internal implementation) +- Mock doesn't match real buffer context +- Test failed with "Marker does not point anywhere" (fighting mocks, not fixing bugs) +- Tested edge case (decode failure) already covered by network error tests + +**Decision**: **Delete it** +- No behavior lost (network errors already tested) +- Removed maintenance burden (test coupled to implementation) +- Fighting the test, not fixing real bugs + +**Alternative considered**: Refactor code +- But network error coverage made this test redundant +- Sometimes the right test count is fewer tests + +**** Key Insight: Test Behavior, Not Implementation + +Focus on **what** the code does, not **how** it does it. + +**Bad** (tests implementation): +- "When decode-coding-string fails, callback gets nil" +- "When buffer parsing throws error, buffer is cleaned up" +- "When regex match fails, function returns nil" + +**Good** (tests behavior): +- "When network request fails, callback gets nil" +- "When URL returns invalid data, callback gets nil" +- "When server is unreachable, callback gets nil" + +The implementation might change (different parsing, different error handling), but the behavior contract stays the same. + ## Red Flags Watch for and report these issues: @@ -762,7 +1029,11 @@ Watch for and report these issues: - Tests where mocks do the actual work instead of the production code - Tests that would pass if the function implementation was deleted - Mocking data parsing/transformation when you should create real test data + - **Mocking internal implementation details** (decode-coding-string, buffer functions, string parsing) + - **Tests more complex than the code** (if test setup is longer than the function, something's wrong) + - **Tests that break when refactoring** without behavior change (coupled to implementation) - Flaky tests that pass/fail intermittently - Tests that are too slow - Tests that require manual setup or verification - **Hardcoded dates in tests** - Will fail when dates pass, creates maintenance burden +- **Deep nesting in production code** (7+ closing parens) - refactor, don't just test |
