summaryrefslogtreecommitdiff
path: root/devdocs/elisp/parsing-html_002fxml.html
diff options
context:
space:
mode:
authorCraig Jennings <c@cjennings.net>2024-04-07 13:41:34 -0500
committerCraig Jennings <c@cjennings.net>2024-04-07 13:41:34 -0500
commit754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch)
treef1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/elisp/parsing-html_002fxml.html
new repository
Diffstat (limited to 'devdocs/elisp/parsing-html_002fxml.html')
-rw-r--r--devdocs/elisp/parsing-html_002fxml.html30
1 files changed, 30 insertions, 0 deletions
diff --git a/devdocs/elisp/parsing-html_002fxml.html b/devdocs/elisp/parsing-html_002fxml.html
new file mode 100644
index 00000000..d60e483f
--- /dev/null
+++ b/devdocs/elisp/parsing-html_002fxml.html
@@ -0,0 +1,30 @@
+ <h3 class="section">Parsing HTML and XML</h3> <p>Emacs can be compiled with built-in libxml2 support. </p> <dl> <dt id="libxml-available-p">Function: <strong>libxml-available-p</strong>
+</dt> <dd><p>This function returns non-<code>nil</code> if built-in libxml2 support is available in this Emacs session. </p></dd>
+</dl> <p>When libxml2 support is available, the following functions can be used to parse HTML or XML text into Lisp object trees. </p> <dl> <dt id="libxml-parse-html-region">Function: <strong>libxml-parse-html-region</strong> <em>start end &amp;optional base-url discard-comments</em>
+</dt> <dd>
+<p>This function parses the text between <var>start</var> and <var>end</var> as HTML, and returns a list representing the HTML <em>parse tree</em>. It attempts to handle real-world HTML by robustly coping with syntax mistakes. </p> <p>The optional argument <var>base-url</var>, if non-<code>nil</code>, should be a string specifying the base URL for relative URLs occurring in links. </p> <p>If the optional argument <var>discard-comments</var> is non-<code>nil</code>, any top-level comment is discarded. (This argument is obsolete and will be removed in future Emacs versions. To remove comments, use the <code>xml-remove-comments</code> utility function on the data before you call the parsing function.) </p> <p>In the parse tree, each HTML node is represented by a list in which the first element is a symbol representing the node name, the second element is an alist of node attributes, and the remaining elements are the subnodes. </p> <p>The following example demonstrates this. Given this (malformed) HTML document: </p> <div class="example"> <pre class="example">&lt;html&gt;&lt;head&gt;&lt;/head&gt;&lt;body width=101&gt;&lt;div class=thing&gt;Foo&lt;div&gt;Yes
+</pre>
+</div> <p>A call to <code>libxml-parse-html-region</code> returns this <acronym>DOM</acronym> (document object model): </p> <div class="example"> <pre class="example">(html nil
+ (head nil)
+ (body ((width . "101"))
+ (div ((class . "thing"))
+ "Foo"
+ (div nil
+ "Yes"))))
+</pre>
+</div> </dd>
+</dl> <dl> <dt id="shr-insert-document">Function: <strong>shr-insert-document</strong> <em>dom</em>
+</dt> <dd><p>This function renders the parsed HTML in <var>dom</var> into the current buffer. The argument <var>dom</var> should be a list as generated by <code>libxml-parse-html-region</code>. This function is, e.g., used by <a href="https://www.gnu.org/software/emacs/manual/html_node/eww/index.html#Top">EWW</a> in <cite>The Emacs Web Wowser Manual</cite>. </p></dd>
+</dl> <dl> <dt id="libxml-parse-xml-region">Function: <strong>libxml-parse-xml-region</strong> <em>start end &amp;optional base-url discard-comments</em>
+</dt> <dd><p>This function is the same as <code>libxml-parse-html-region</code>, except that it parses the text as XML rather than HTML (so it is stricter about syntax). </p></dd>
+</dl> <table class="menu" border="0" cellspacing="0"> <tr>
+<td align="left" valign="top">• <a href="document-object-model" accesskey="1">Document Object Model</a>
+</td>
+<td> </td>
+<td align="left" valign="top">Access, manipulate and search the <acronym>DOM</acronym>. </td>
+</tr> </table><div class="_attribution">
+ <p class="_attribution-p">
+ Copyright &copy; 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br>
+ <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-HTML_002fXML.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Parsing-HTML_002fXML.html</a>
+ </p>
+</div>