summaryrefslogtreecommitdiff
path: root/devdocs/python~3.12/library%2Furllib.robotparser.html
diff options
context:
space:
mode:
Diffstat (limited to 'devdocs/python~3.12/library%2Furllib.robotparser.html')
-rw-r--r--devdocs/python~3.12/library%2Furllib.robotparser.html51
1 files changed, 51 insertions, 0 deletions
diff --git a/devdocs/python~3.12/library%2Furllib.robotparser.html b/devdocs/python~3.12/library%2Furllib.robotparser.html
new file mode 100644
index 00000000..116f14f1
--- /dev/null
+++ b/devdocs/python~3.12/library%2Furllib.robotparser.html
@@ -0,0 +1,51 @@
+ <span id="urllib-robotparser-parser-for-robots-txt"></span><h1>urllib.robotparser — Parser for robots.txt</h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.12/Lib/urllib/robotparser.py">Lib/urllib/robotparser.py</a></p> <p>This module provides a single class, <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code>RobotFileParser</code></a>, which answers questions about whether or not a particular user agent can fetch a URL on the web site that published the <code>robots.txt</code> file. For more details on the structure of <code>robots.txt</code> files, see <a class="reference external" href="http://www.robotstxt.org/orig.html">http://www.robotstxt.org/orig.html</a>.</p> <dl class="py class"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser">
+<code>class urllib.robotparser.RobotFileParser(url='')</code> </dt> <dd>
+<p>This class provides methods to read, parse and answer questions about the <code>robots.txt</code> file at <em>url</em>.</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.set_url">
+<code>set_url(url)</code> </dt> <dd>
+<p>Sets the URL referring to a <code>robots.txt</code> file.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.read">
+<code>read()</code> </dt> <dd>
+<p>Reads the <code>robots.txt</code> URL and feeds it to the parser.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.parse">
+<code>parse(lines)</code> </dt> <dd>
+<p>Parses the lines argument.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.can_fetch">
+<code>can_fetch(useragent, url)</code> </dt> <dd>
+<p>Returns <code>True</code> if the <em>useragent</em> is allowed to fetch the <em>url</em> according to the rules contained in the parsed <code>robots.txt</code> file.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.mtime">
+<code>mtime()</code> </dt> <dd>
+<p>Returns the time the <code>robots.txt</code> file was last fetched. This is useful for long-running web spiders that need to check for new <code>robots.txt</code> files periodically.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.modified">
+<code>modified()</code> </dt> <dd>
+<p>Sets the time the <code>robots.txt</code> file was last fetched to the current time.</p> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.crawl_delay">
+<code>crawl_delay(useragent)</code> </dt> <dd>
+<p>Returns the value of the <code>Crawl-delay</code> parameter from <code>robots.txt</code> for the <em>useragent</em> in question. If there is no such parameter or it doesn’t apply to the <em>useragent</em> specified or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.6.</span></p> </div> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.request_rate">
+<code>request_rate(useragent)</code> </dt> <dd>
+<p>Returns the contents of the <code>Request-rate</code> parameter from <code>robots.txt</code> as a <a class="reference internal" href="../glossary#term-named-tuple"><span class="xref std std-term">named tuple</span></a> <code>RequestRate(requests, seconds)</code>. If there is no such parameter or it doesn’t apply to the <em>useragent</em> specified or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.6.</span></p> </div> </dd>
+</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.site_maps">
+<code>site_maps()</code> </dt> <dd>
+<p>Returns the contents of the <code>Sitemap</code> parameter from <code>robots.txt</code> in the form of a <a class="reference internal" href="stdtypes#list" title="list"><code>list()</code></a>. If there is no such parameter or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.8.</span></p> </div> </dd>
+</dl> </dd>
+</dl> <p>The following example demonstrates basic use of the <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code>RobotFileParser</code></a> class:</p> <pre data-language="python">&gt;&gt;&gt; import urllib.robotparser
+&gt;&gt;&gt; rp = urllib.robotparser.RobotFileParser()
+&gt;&gt;&gt; rp.set_url("http://www.musi-cal.com/robots.txt")
+&gt;&gt;&gt; rp.read()
+&gt;&gt;&gt; rrate = rp.request_rate("*")
+&gt;&gt;&gt; rrate.requests
+3
+&gt;&gt;&gt; rrate.seconds
+20
+&gt;&gt;&gt; rp.crawl_delay("*")
+6
+&gt;&gt;&gt; rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
+False
+&gt;&gt;&gt; rp.can_fetch("*", "http://www.musi-cal.com/")
+True
+</pre> <div class="_attribution">
+ <p class="_attribution-p">
+ &copy; 2001&ndash;2023 Python Software Foundation<br>Licensed under the PSF License.<br>
+ <a href="https://docs.python.org/3.12/library/urllib.robotparser.html" class="_attribution-link">https://docs.python.org/3.12/library/urllib.robotparser.html</a>
+ </p>
+</div>