diff options
| author | Craig Jennings <c@cjennings.net> | 2026-02-03 08:13:01 -0600 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2026-02-03 08:13:01 -0600 |
| commit | 8af6ef2f8618687b414f9e6b064cf77b8333d73c (patch) | |
| tree | b4b1cf82b435e0d0b30cf12ba4ee9c47b43be4d7 /emojis/emojione-v2.2.6-22/1f303.png | |
| parent | 09cfcfd6826f9bc8b379dde88e1d9ca719c1bdb2 (diff) | |
perf(lorem-optimum): fix O(n²) tokenization algorithm
The tokenizer was creating substring copies on every iteration:
- (substring text pos (1+ pos)) for whitespace check
- (substring text pos) for regex matching - copies ALL remaining text
This caused 10K word tokenization to take 727ms instead of 6ms.
Fix: Use string-match with start position parameter and check
characters directly with aref instead of creating substrings.
Performance improvement:
- Tokenize 10K words: 727ms ā 6ms (120x faster)
- Learn 10K words: 873ms ā 15ms (59x faster)
- Learn 100K words: 70s ā 208ms (341x faster)
Diffstat (limited to 'emojis/emojione-v2.2.6-22/1f303.png')
0 files changed, 0 insertions, 0 deletions
