diff options
| author | Craig Jennings <c@cjennings.net> | 2026-02-03 08:13:01 -0600 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-02-03 08:13:01 -0600 |
| commit | 552410e64aa3c3cdef3e9485d782a722283f6b45 (patch) | |
| tree | 6e069eed5cf76ea43901593fe3e1b895b18fc1ad /tests | |
| parent | 431db43523604631bee3e72c6e53f5c752053ce2 (diff) | |
| download | dotemacs-552410e64aa3c3cdef3e9485d782a722283f6b45.tar.gz dotemacs-552410e64aa3c3cdef3e9485d782a722283f6b45.zip | |
perf(lorem-optimum): fix O(n²) tokenization algorithm
The tokenizer was creating substring copies on every iteration:
- (substring text pos (1+ pos)) for whitespace check
- (substring text pos) for regex matching - copies ALL remaining text
This caused 10K word tokenization to take 727ms instead of 6ms.
Fix: Use string-match with start position parameter and check
characters directly with aref instead of creating substrings.
Performance improvement:
- Tokenize 10K words: 727ms → 6ms (120x faster)
- Learn 10K words: 873ms → 15ms (59x faster)
- Learn 100K words: 70s → 208ms (341x faster)
Diffstat (limited to 'tests')
0 files changed, 0 insertions, 0 deletions
