perf(lorem-optimum): fix O(n²) tokenization algorithm - dotemacs

diff options

author	Craig Jennings <c@cjennings.net>	2026-02-03 08:13:01 -0600
committer	Craig Jennings <c@cjennings.net>	2026-02-03 08:13:01 -0600
commit	8a202b2b97f7f34ce13d095e8663caaa4424b503 (patch)
tree	cb50d8168b94758411f57b2bfd892550d453e7b4 /modules/selection-framework.el
parent	c736265083dfa7f3a5acb61de289ac44ee921fb1 (diff)
download	dotemacs-8a202b2b97f7f34ce13d095e8663caaa4424b503.tar.gz dotemacs-8a202b2b97f7f34ce13d095e8663caaa4424b503.zip

perf(lorem-optimum): fix O(n²) tokenization algorithm

The tokenizer was creating substring copies on every iteration: - (substring text pos (1+ pos)) for whitespace check - (substring text pos) for regex matching - copies ALL remaining text This caused 10K word tokenization to take 727ms instead of 6ms. Fix: Use string-match with start position parameter and check characters directly with aref instead of creating substrings. Performance improvement: - Tokenize 10K words: 727ms → 6ms (120x faster) - Learn 10K words: 873ms → 15ms (59x faster) - Learn 100K words: 70s → 208ms (341x faster)

Diffstat (limited to 'modules/selection-framework.el')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: