1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
|
<span id="xml-dom-pulldom-support-for-building-partial-dom-trees"></span><h1>xml.dom.pulldom — Support for building partial DOM trees</h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.12/Lib/xml/dom/pulldom.py">Lib/xml/dom/pulldom.py</a></p> <p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code>xml.dom.pulldom</code></a> module provides a “pull parser” which can also be asked to produce DOM-accessible fragments of the document where necessary. The basic concept involves pulling “events” from a stream of incoming XML and processing them. In contrast to SAX which also employs an event-driven processing model together with callbacks, the user of a pull parser is responsible for explicitly pulling events from the stream, looping over those events until either processing is finished or an error condition occurs.</p> <div class="admonition warning"> <p class="admonition-title">Warning</p> <p>The <a class="reference internal" href="#module-xml.dom.pulldom" title="xml.dom.pulldom: Support for building partial DOM trees from SAX events."><code>xml.dom.pulldom</code></a> module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see <a class="reference internal" href="xml#xml-vulnerabilities"><span class="std std-ref">XML vulnerabilities</span></a>.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.7.1: </span>The SAX parser no longer processes general external entities by default to increase security by default. To enable processing of external entities, pass a custom parser instance in:</p> <pre data-language="python">from xml.dom.pulldom import parse
from xml.sax import make_parser
from xml.sax.handler import feature_external_ges
parser = make_parser()
parser.setFeature(feature_external_ges, True)
parse(filename, parser=parser)
</pre> </div> <p>Example:</p> <pre data-language="python">from xml.dom import pulldom
doc = pulldom.parse('sales_items.xml')
for event, node in doc:
if event == pulldom.START_ELEMENT and node.tagName == 'item':
if int(node.getAttribute('price')) > 50:
doc.expandNode(node)
print(node.toxml())
</pre> <p><code>event</code> is a constant and can be one of:</p> <ul class="simple"> <li><code>START_ELEMENT</code></li> <li><code>END_ELEMENT</code></li> <li><code>COMMENT</code></li> <li><code>START_DOCUMENT</code></li> <li><code>END_DOCUMENT</code></li> <li><code>CHARACTERS</code></li> <li><code>PROCESSING_INSTRUCTION</code></li> <li><code>IGNORABLE_WHITESPACE</code></li> </ul> <p><code>node</code> is an object of type <code>xml.dom.minidom.Document</code>, <code>xml.dom.minidom.Element</code> or <code>xml.dom.minidom.Text</code>.</p> <p>Since the document is treated as a “flat” stream of events, the document “tree” is implicitly traversed and the desired elements are found regardless of their depth in the tree. In other words, one does not need to consider hierarchical issues such as recursive searching of the document nodes, although if the context of elements were important, one would either need to maintain some context-related state (i.e. remembering where one is in the document at any given point) or to make use of the <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code>DOMEventStream.expandNode()</code></a> method and switch to DOM-related processing.</p> <dl class="py class"> <dt class="sig sig-object py" id="xml.dom.pulldom.PullDom">
<code>class xml.dom.pulldom.PullDom(documentFactory=None)</code> </dt> <dd>
<p>Subclass of <a class="reference internal" href="xml.sax.handler#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code>xml.sax.handler.ContentHandler</code></a>.</p> </dd>
</dl> <dl class="py class"> <dt class="sig sig-object py" id="xml.dom.pulldom.SAX2DOM">
<code>class xml.dom.pulldom.SAX2DOM(documentFactory=None)</code> </dt> <dd>
<p>Subclass of <a class="reference internal" href="xml.sax.handler#xml.sax.handler.ContentHandler" title="xml.sax.handler.ContentHandler"><code>xml.sax.handler.ContentHandler</code></a>.</p> </dd>
</dl> <dl class="py function"> <dt class="sig sig-object py" id="xml.dom.pulldom.parse">
<code>xml.dom.pulldom.parse(stream_or_string, parser=None, bufsize=None)</code> </dt> <dd>
<p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code>DOMEventStream</code></a> from the given input. <em>stream_or_string</em> may be either a file name, or a file-like object. <em>parser</em>, if given, must be an <a class="reference internal" href="xml.sax.reader#xml.sax.xmlreader.XMLReader" title="xml.sax.xmlreader.XMLReader"><code>XMLReader</code></a> object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance.</p> </dd>
</dl> <p>If you have XML in a string, you can use the <a class="reference internal" href="#xml.dom.pulldom.parseString" title="xml.dom.pulldom.parseString"><code>parseString()</code></a> function instead:</p> <dl class="py function"> <dt class="sig sig-object py" id="xml.dom.pulldom.parseString">
<code>xml.dom.pulldom.parseString(string, parser=None)</code> </dt> <dd>
<p>Return a <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream" title="xml.dom.pulldom.DOMEventStream"><code>DOMEventStream</code></a> that represents the (Unicode) <em>string</em>.</p> </dd>
</dl> <dl class="py data"> <dt class="sig sig-object py" id="xml.dom.pulldom.default_bufsize">
<code>xml.dom.pulldom.default_bufsize</code> </dt> <dd>
<p>Default value for the <em>bufsize</em> parameter to <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code>parse()</code></a>.</p> <p>The value of this variable can be changed before calling <a class="reference internal" href="#xml.dom.pulldom.parse" title="xml.dom.pulldom.parse"><code>parse()</code></a> and the new value will take effect.</p> </dd>
</dl> <section id="domeventstream-objects"> <span id="id1"></span><h2>DOMEventStream Objects</h2> <dl class="py class"> <dt class="sig sig-object py" id="xml.dom.pulldom.DOMEventStream">
<code>class xml.dom.pulldom.DOMEventStream(stream, parser, bufsize)</code> </dt> <dd>
<div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.11: </span>Support for <a class="reference internal" href="../reference/datamodel#object.__getitem__" title="object.__getitem__"><code>__getitem__()</code></a> method has been removed.</p> </div> <dl class="py method"> <dt class="sig sig-object py" id="xml.dom.pulldom.DOMEventStream.getEvent">
<code>getEvent()</code> </dt> <dd>
<p>Return a tuple containing <em>event</em> and the current <em>node</em> as <code>xml.dom.minidom.Document</code> if event equals <code>START_DOCUMENT</code>, <code>xml.dom.minidom.Element</code> if event equals <code>START_ELEMENT</code> or <code>END_ELEMENT</code> or <code>xml.dom.minidom.Text</code> if event equals <code>CHARACTERS</code>. The current node does not contain information about its children, unless <a class="reference internal" href="#xml.dom.pulldom.DOMEventStream.expandNode" title="xml.dom.pulldom.DOMEventStream.expandNode"><code>expandNode()</code></a> is called.</p> </dd>
</dl> <dl class="py method"> <dt class="sig sig-object py" id="xml.dom.pulldom.DOMEventStream.expandNode">
<code>expandNode(node)</code> </dt> <dd>
<p>Expands all children of <em>node</em> into <em>node</em>. Example:</p> <pre data-language="python">from xml.dom import pulldom
xml = '<html><title>Foo</title> <p>Some text <div>and more</div></p> </html>'
doc = pulldom.parseString(xml)
for event, node in doc:
if event == pulldom.START_ELEMENT and node.tagName == 'p':
# Following statement only prints '<p/>'
print(node.toxml())
doc.expandNode(node)
# Following statement prints node with all its children '<p>Some text <div>and more</div></p>'
print(node.toxml())
</pre> </dd>
</dl> <dl class="py method"> <dt class="sig sig-object py" id="xml.dom.pulldom.DOMEventStream.reset">
<code>reset()</code> </dt> <dd></dd>
</dl> </dd>
</dl> </section> <div class="_attribution">
<p class="_attribution-p">
© 2001–2023 Python Software Foundation<br>Licensed under the PSF License.<br>
<a href="https://docs.python.org/3.12/library/xml.dom.pulldom.html" class="_attribution-link">https://docs.python.org/3.12/library/xml.dom.pulldom.html</a>
</p>
</div>
|