summaryrefslogtreecommitdiff
path: root/devdocs/elisp/extending-rx.html
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2024-04-07 13:41:34 -0500
committerCraig Jennings <c@cjennings.net>2024-04-07 13:41:34 -0500
commit754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch)
treef1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/elisp/extending-rx.html
new repository
Diffstat (limited to 'devdocs/elisp/extending-rx.html')
-rw-r--r--devdocs/elisp/extending-rx.html57
1 files changed, 57 insertions, 0 deletions
diff --git a/devdocs/elisp/extending-rx.html b/devdocs/elisp/extending-rx.html
new file mode 100644
index 00000000..ed09ee93
--- /dev/null
+++ b/devdocs/elisp/extending-rx.html
@@ -0,0 +1,57 @@
+ <h4 class="subsubsection">Defining new rx forms</h4> <p>The <code>rx</code> notation can be extended by defining new symbols and parameterized forms in terms of other <code>rx</code> expressions. This is handy for sharing parts between several regexps, and for making complex ones easier to build and understand by putting them together from smaller pieces. </p> <p>For example, you could define <code>name</code> to mean <code>(one-or-more letter)</code>, and <code>(quoted <var>x</var>)</code> to mean <code>(seq ?' <var>x</var> ?')</code> for any <var>x</var>. These forms could then be used in <code>rx</code> expressions like any other: <code>(rx (quoted name))</code> would match a nonempty sequence of letters inside single quotes. </p> <p>The Lisp macros below provide different ways of binding names to definitions. Common to all of them are the following rules: </p> <ul> <li> Built-in <code>rx</code> forms, like <code>digit</code> and <code>group</code>, cannot be redefined. </li>
+<li> The definitions live in a name space of their own, separate from that of Lisp variables. There is thus no need to attach a suffix like <code>-regexp</code> to names; they cannot collide with anything else. </li>
+<li> Definitions cannot refer to themselves recursively, directly or indirectly. If you find yourself needing this, you want a parser, not a regular expression. </li>
+<li> Definitions are only ever expanded in calls to <code>rx</code> or <code>rx-to-string</code>, not merely by their presence in definition macros. This means that the order of definitions doesn’t matter, even when they refer to each other, and that syntax errors only show up when they are used, not when they are defined. </li>
+<li> User-defined forms are allowed wherever arbitrary <code>rx</code> expressions are expected; for example, in the body of a <code>zero-or-one</code> form, but not inside <code>any</code> or <code>category</code> forms. They are also allowed inside <code>not</code> and <code>intersection</code> forms. </li>
+</ul> <dl> <dt id="rx-define">Macro: <strong>rx-define</strong> <em>name [arglist] rx-form</em>
+</dt> <dd>
+<p>Define <var>name</var> globally in all subsequent calls to <code>rx</code> and <code>rx-to-string</code>. If <var>arglist</var> is absent, then <var>name</var> is defined as a plain symbol to be replaced with <var>rx-form</var>. Example: </p> <div class="example"> <pre class="example">(rx-define haskell-comment (seq "--" (zero-or-more nonl)))
+(rx haskell-comment)
+ ⇒ "--.*"
+</pre>
+</div> <p>If <var>arglist</var> is present, it must be a list of zero or more argument names, and <var>name</var> is then defined as a parameterized form. When used in an <code>rx</code> expression as <code>(<var>name</var> <var>arg</var>…)</code>, each <var>arg</var> will replace the corresponding argument name inside <var>rx-form</var>. </p> <p><var>arglist</var> may end in <code>&amp;rest</code> and one final argument name, denoting a rest parameter. The rest parameter will expand to all extra actual argument values not matched by any other parameter in <var>arglist</var>, spliced into <var>rx-form</var> where it occurs. Example: </p> <div class="example"> <pre class="example">(rx-define moan (x y &amp;rest r) (seq x (one-or-more y) r "!"))
+(rx (moan "MOO" "A" "MEE" "OW"))
+ ⇒ "MOOA+MEEOW!"
+</pre>
+</div> <p>Since the definition is global, it is recommended to give <var>name</var> a package prefix to avoid name clashes with definitions elsewhere, as is usual when naming non-local variables and functions. </p> <p>Forms defined this way only perform simple template substitution. For arbitrary computations, use them together with the <code>rx</code> forms <code>eval</code>, <code>regexp</code> or <code>literal</code>. Example: </p> <div class="example"> <pre class="example">(defun n-tuple-rx (n element)
+ `(seq "&lt;"
+ (group-n 1 ,element)
+ ,@(mapcar (lambda (i) `(seq ?, (group-n ,i ,element)))
+ (number-sequence 2 n))
+ "&gt;"))
+(rx-define n-tuple (n element) (eval (n-tuple-rx n 'element)))
+(rx (n-tuple 3 (+ (in "0-9"))))
+ ⇒ "&lt;\\(?1:[0-9]+\\),\\(?2:[0-9]+\\),\\(?3:[0-9]+\\)&gt;"
+</pre>
+</div> </dd>
+</dl> <dl> <dt id="rx-let">Macro: <strong>rx-let</strong> <em>(bindings…) body…</em>
+</dt> <dd>
+<p>Make the <code>rx</code> definitions in <var>bindings</var> available locally for <code>rx</code> macro invocations in <var>body</var>, which is then evaluated. </p> <p>Each element of <var>bindings</var> is on the form <code>(<var>name</var> [<var>arglist</var>] <var><span class="nolinebreak">rx-form</span></var>)</code>, where the parts have the same meaning as in <code>rx-define</code> above. Example: </p> <div class="example"> <pre class="example">(rx-let ((comma-separated (item) (seq item (0+ "," item)))
+ (number (1+ digit))
+ (numbers (comma-separated number)))
+ (re-search-forward (rx "(" numbers ")")))
+</pre>
+</div> <p>The definitions are only available during the macro-expansion of <var>body</var>, and are thus not present during execution of compiled code. </p> <p><code>rx-let</code> can be used not only inside a function, but also at top level to include global variable and function definitions that need to share a common set of <code>rx</code> forms. Since the names are local inside <var>body</var>, there is no need for any package prefixes. Example: </p> <div class="example"> <pre class="example">(rx-let ((phone-number (seq (opt ?+) (1+ (any digit ?-)))))
+ (defun find-next-phone-number ()
+ (re-search-forward (rx phone-number)))
+ (defun phone-number-p (string)
+ (string-match-p (rx bos phone-number eos) string)))
+</pre>
+</div> <p>The scope of the <code>rx-let</code> bindings is lexical, which means that they are not visible outside <var>body</var> itself, even in functions called from <var>body</var>. </p>
+</dd>
+</dl> <dl> <dt id="rx-let-eval">Macro: <strong>rx-let-eval</strong> <em>bindings body…</em>
+</dt> <dd>
+<p>Evaluate <var>bindings</var> to a list of bindings as in <code>rx-let</code>, and evaluate <var>body</var> with those bindings in effect for calls to <code>rx-to-string</code>. </p> <p>This macro is similar to <code>rx-let</code>, except that the <var>bindings</var> argument is evaluated (and thus needs to be quoted if it is a list literal), and the definitions are substituted at run time, which is required for <code>rx-to-string</code> to work. Example: </p> <div class="example"> <pre class="example">(rx-let-eval
+ '((ponder (x) (seq "Where have all the " x " gone?")))
+ (looking-at (rx-to-string
+ '(ponder (or "flowers" "young girls"
+ "left socks")))))
+</pre>
+</div> <p>Another difference from <code>rx-let</code> is that the <var>bindings</var> are dynamically scoped, and thus also available in functions called from <var>body</var>. However, they are not visible inside functions defined in <var>body</var>. </p>
+</dd>
+</dl><div class="_attribution">
+ <p class="_attribution-p">
+ Copyright &copy; 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br>
+ <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Extending-Rx.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Extending-Rx.html</a>
+ </p>
+</div>