aboutsummaryrefslogtreecommitdiff
path: root/docs/python-treesit-predicate-mismatch.txt
blob: c56886aff7b5b2f5624846b8c75a8a77525928d1 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
PYTHON TREE-SITTER FONT-LOCK PREDICATE MISMATCH — DIAGNOSIS (paused 2026-04-26)
================================================================================

STATUS
------
/start-work paused at Gate 2 (approach was being investigated).
Todo.org entry reverted to TODO. This diagnostic captures the full investigation
so the next session can pick up at decision-time without re-deriving the cause.

Linked todo.org entry:
  * TODO [#A] Fix Python tree-sitter font-lock query syntax error :bug:
  SCHEDULED: <2026-04-27 Mon>


THE BUG
-------
Every Python file redisplay fires:

  Error during redisplay: (jit-lock-function N) signaled
    (treesit-query-error "Syntax error at" 358 ...
     "Debug the query with `treesit-query-validate'")

The reported failing query is the keyword + self-as-keyword block from upstream
Emacs python.el at lines 1188-1190:

  `([,@python--treesit-keywords] @font-lock-keyword-face
    ((identifier) @font-lock-keyword-face
     (:match "\\`self\\'" @font-lock-keyword-face)))


ROOT CAUSE — VERSION MISMATCH (system-level, not in .emacs.d)
-------------------------------------------------------------
Emacs version:           30.2
tree-sitter library:     0.26.8 (/usr/lib/libtree-sitter.so.0.26)
tree-sitter grammar ABI: 15

Tree-sitter 0.26.x enforces that predicate names must end in "?" or "!" — for
example #match? or #any-of?. The unsuffixed form #match (without "?") is
rejected with a syntax error.

Emacs 30.2's treesit translates elisp predicate keywords to tree-sitter
predicate strings WITHOUT the "?" suffix:

  :match  →  #match    (rejected by tree-sitter 0.26)
  :equal  →  #equal    (rejected)
  :pred   →  #pred     (rejected)

And Emacs 30.2 ALSO rejects raw string queries that use the "?" form:

  Tested raw query "((identifier) @cap (#match? @cap \"^self$\"))"
  → Emacs error: "Invalid predicate match? Currently Emacs only supports
    `equal', `match', and `pred' predicates"

Both ends are strict and incompatible. There is no API path through Emacs
30.2's treesit that produces a query tree-sitter 0.26 will accept when any
predicate is used.

This is NOT specific to the self-as-keyword line. It affects every :match,
:equal, :pred predicate in every language's treesit font-lock settings.
Python alone has six other :match predicates in python--treesit-settings
(features: builtin, type, function — at lines 1205, 1209, 1260, 1296, 1297,
1303 of python.el). Other treesit-aware modes (rust-ts, go-ts, c-ts, etc.)
likely have the same shape and may also be silently or loudly broken.

The keyword block is the loudest because it captures every identifier in the
buffer, so the predicate evaluation is forced on every redisplay. Other queries
with rarer captures may fail silently (one-off errors) or not fire often
enough to flood Messages.


WHY THE OTHER ERRORS LOOK LIKE THIS WAS THE ONLY BREAK
------------------------------------------------------
Tree-sitter validates predicates lazily — only when the predicate is actually
evaluated against captured text. The keyword query captures every identifier
(very common), so its predicate evaluates per redisplay → error per
redisplay → flood. Builtin/type/function queries only fire on more selective
node types, so their errors fire less often and may not be visible in the
*Messages* sample we collected.


REPRODUCTION (verified in batch with --batch)
---------------------------------------------
emacs --batch --eval "
(progn
  (require 'treesit)
  (require 'python)
  (with-temp-buffer
    (insert \"def foo(self): return self\")
    (treesit-parser-create 'python)
    (let ((q '((identifier) @cap (:match \"^self\$\" @cap))))
      (condition-case err
          (treesit-query-capture 'python q (point-min) (point-max))
        (error (message \"FAIL: %s\" err))))))"

Output: FAIL: (treesit-query-error Syntax error at 26
              (identifier) @cap (#match @cap "^self$") ...)


FIX OPTIONS (each with trade-off)
----------------------------------
A. WAIT FOR UPSTREAM EMACS FIX
   Most likely the right answer if Emacs 30.3 ships the predicate-suffix
   compatibility. Check pacman/news for emacs updates regularly. Zero local
   work; loses font-lock fidelity in the meantime.

B. DOWNGRADE TREE-SITTER LIBRARY
   Roll /usr/lib/libtree-sitter.so back to a pre-0.26 version that accepts
   unsuffixed predicates. System-wide change; affects every package that
   links tree-sitter. Risky — would need pinning in pacman to prevent
   re-upgrade. Reject unless we hit other breakages.

C. PATCH EMACS' treesit.c TO EMIT #match?
   The right structural fix. Requires building Emacs from source with a
   one-line patch (or applying via a custom AUR/PKGBUILD). Major effort,
   ongoing maintenance burden.

D. OVERRIDE python--treesit-settings IN .emacs.d
   After python.el loads, replace the broken queries with predicate-free
   variants (or omit the affected features entirely). Pros: local, surgical.
   Cons: loses self-as-keyword highlighting and any other predicate-using
   feature; must be redone for every treesit mode (rust-ts, go-ts, c-ts,
   typescript-ts, ...).

E. ADVISE treesit-query-compile / treesit-query-capture
   Wrap the C calls so #match → #match? in the query string before tree-
   sitter sees it. Theoretically possible if the predicate evaluator can
   still recognize the "?" form on its end. Risky; needs verification that
   Emacs's predicate dispatch table maps "match?" or just "match" — the
   "Invalid predicate match?" error suggests the dispatch table is keyed on
   the unsuffixed form, so this advice would need to translate both
   directions. Untested.

F. PIN EMACS IN PACMAN AT 30.2 OR EARLIER + DOWNGRADE TREE-SITTER
   System-level pin both packages until upstream fixes land. Safe but
   stops other Emacs/tree-sitter package updates entirely.


RECOMMENDED PATH (subject to Craig's call)
------------------------------------------
1. Verify whether Emacs has a patch in 30.3 / master. Search pacman -Q
   for emacs-snapshot variants. Skim NEWS or git log for treesit predicate
   fixes if a build-from-source is on the table.

2. If no upstream fix is imminent: option D (override
   python--treesit-settings) for the immediate noise — a small,
   project-local patch in modules/prog-python.el that strips the keyword
   feature's :match sub-query. Loses self-as-keyword highlighting only;
   keeps the rest of Python font-lock working.

3. File this as a known bug + document the workaround in CLAUDE.md or a
   project note so it doesn't get re-investigated next time someone hits
   it.


VERIFICATION PATH (after a fix lands)
-------------------------------------
1. Restart Emacs.
2. Open any Python file from the dashboard MVP.
3. Watch *Messages* for "treesit-query-error" — should be zero.
4. Visually confirm Python keywords still highlight (def, class, return,
   import, etc.).
5. If using option D specifically: confirm "self" no longer renders as a
   keyword (expected loss).


INVESTIGATION ARTIFACTS
-----------------------
Position 358 in the failing query string corresponds to the regex string
opening quote. Initially suspected a regex syntax problem (Emacs's \\`...\\'
anchors not in tree-sitter regex flavor). Disproved: the error fires even
with tree-sitter-friendly regex like ^self$, and even with :equal predicates
that don't compile a regex at all. Confirmed by isolating predicate variants:

  ((identifier) @cap)                                  → OK
  ((identifier) @cap (:match  "^self$"  @cap))         → FAIL at #match
  ((identifier) @cap (:equal  @cap "self"))            → FAIL at #equal
  ((identifier) @cap (:pred   (lambda ...) @cap))      → FAIL at #pred

  Raw string "((identifier) @cap (#match? @cap \"...\"))"
                                                       → FAIL Emacs (predicate ?)
  Raw string "((identifier) @cap (#match  @cap \"...\"))"
                                                       → FAIL tree-sitter

The "Syntax error at 26" reported in those isolated cases corresponds to the
position of the predicate's opening "(" — tree-sitter parser refusing the
unsuffixed predicate form before it ever evaluates anything.


SCOPE NOTE
----------
This is upstream / system-level. Not a .emacs.d bug. Three fix surfaces:
  - Emacs source (treesit.c predicate translation)
  - Tree-sitter library (predicate-suffix strictness)
  - Local override in .emacs.d (workaround only)

Fixing in .emacs.d is a workaround, not a root-cause fix. Document it as such.


RESOLVED 2026-05-14
-------------------
Bug no longer reproduces against the current versions:
  - emacs 30.2-3 (Arch package; upgraded from 30.2-2 on 2026-05-03)
  - tree-sitter 0.26.8-1 (unchanged from the original investigation)

The exact failing query from the investigation (python.el lines
1188-1190, the keyword + self-as-keyword block) now runs cleanly
under `treesit-query-capture'.  `font-lock-ensure' on a real Python
file under `python-ts-mode' completes with no `treesit-query-error'.
No local override applied to `modules/prog-python.el'.

The upstream Emacs version string is unchanged (30.2 in both), but
the Arch package revision bumped from -2 to -3 on 2026-05-03 -- most
likely carrying a downstream patch that fixed the treesit.c predicate
translation.  This matches option A from the fix-options list above
("WAIT FOR UPSTREAM EMACS FIX").

If the flood ever returns, restart the investigation from the
REPRODUCTION block above against whichever emacs / tree-sitter
versions are then installed.