diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/elisp/parsing-html_002fxml.html | |
new repository
Diffstat (limited to 'devdocs/elisp/parsing-html_002fxml.html')
| -rw-r--r-- | devdocs/elisp/parsing-html_002fxml.html | 30 |
1 files changed, 30 insertions, 0 deletions
diff --git a/devdocs/elisp/parsing-html_002fxml.html b/devdocs/elisp/parsing-html_002fxml.html new file mode 100644 index 00000000..d60e483f --- /dev/null +++ b/devdocs/elisp/parsing-html_002fxml.html @@ -0,0 +1,30 @@ + <h3 class="section">Parsing HTML and XML</h3> <p>Emacs can be compiled with built-in libxml2 support. </p> <dl> <dt id="libxml-available-p">Function: <strong>libxml-available-p</strong> +</dt> <dd><p>This function returns non-<code>nil</code> if built-in libxml2 support is available in this Emacs session. </p></dd> +</dl> <p>When libxml2 support is available, the following functions can be used to parse HTML or XML text into Lisp object trees. </p> <dl> <dt id="libxml-parse-html-region">Function: <strong>libxml-parse-html-region</strong> <em>start end &optional base-url discard-comments</em> +</dt> <dd> +<p>This function parses the text between <var>start</var> and <var>end</var> as HTML, and returns a list representing the HTML <em>parse tree</em>. It attempts to handle real-world HTML by robustly coping with syntax mistakes. </p> <p>The optional argument <var>base-url</var>, if non-<code>nil</code>, should be a string specifying the base URL for relative URLs occurring in links. </p> <p>If the optional argument <var>discard-comments</var> is non-<code>nil</code>, any top-level comment is discarded. (This argument is obsolete and will be removed in future Emacs versions. To remove comments, use the <code>xml-remove-comments</code> utility function on the data before you call the parsing function.) </p> <p>In the parse tree, each HTML node is represented by a list in which the first element is a symbol representing the node name, the second element is an alist of node attributes, and the remaining elements are the subnodes. </p> <p>The following example demonstrates this. Given this (malformed) HTML document: </p> <div class="example"> <pre class="example"><html><head></head><body width=101><div class=thing>Foo<div>Yes +</pre> +</div> <p>A call to <code>libxml-parse-html-region</code> returns this <acronym>DOM</acronym> (document object model): </p> <div class="example"> <pre class="example">(html nil + (head nil) + (body ((width . "101")) + (div ((class . "thing")) + "Foo" + (div nil + "Yes")))) +</pre> +</div> </dd> +</dl> <dl> <dt id="shr-insert-document">Function: <strong>shr-insert-document</strong> <em>dom</em> +</dt> <dd><p>This function renders the parsed HTML in <var>dom</var> into the current buffer. The argument <var>dom</var> should be a list as generated by <code>libxml-parse-html-region</code>. This function is, e.g., used by <a href="https://www.gnu.org/software/emacs/manual/html_node/eww/index.html#Top">EWW</a> in <cite>The Emacs Web Wowser Manual</cite>. </p></dd> +</dl> <dl> <dt id="libxml-parse-xml-region">Function: <strong>libxml-parse-xml-region</strong> <em>start end &optional base-url discard-comments</em> +</dt> <dd><p>This function is the same as <code>libxml-parse-html-region</code>, except that it parses the text as XML rather than HTML (so it is stricter about syntax). </p></dd> +</dl> <table class="menu" border="0" cellspacing="0"> <tr> +<td align="left" valign="top">• <a href="document-object-model" accesskey="1">Document Object Model</a> +</td> +<td> </td> +<td align="left" valign="top">Access, manipulate and search the <acronym>DOM</acronym>. </td> +</tr> </table><div class="_attribution"> + <p class="_attribution-p"> + Copyright © 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br> + <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-HTML_002fXML.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-HTML_002fXML.html</a> + </p> +</div> |
