diff options
Diffstat (limited to 'devdocs/python~3.12/library%2Furllib.robotparser.html')
| -rw-r--r-- | devdocs/python~3.12/library%2Furllib.robotparser.html | 51 |
1 files changed, 0 insertions, 51 deletions
diff --git a/devdocs/python~3.12/library%2Furllib.robotparser.html b/devdocs/python~3.12/library%2Furllib.robotparser.html deleted file mode 100644 index 116f14f1..00000000 --- a/devdocs/python~3.12/library%2Furllib.robotparser.html +++ /dev/null @@ -1,51 +0,0 @@ - <span id="urllib-robotparser-parser-for-robots-txt"></span><h1>urllib.robotparser — Parser for robots.txt</h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.12/Lib/urllib/robotparser.py">Lib/urllib/robotparser.py</a></p> <p>This module provides a single class, <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code>RobotFileParser</code></a>, which answers questions about whether or not a particular user agent can fetch a URL on the web site that published the <code>robots.txt</code> file. For more details on the structure of <code>robots.txt</code> files, see <a class="reference external" href="http://www.robotstxt.org/orig.html">http://www.robotstxt.org/orig.html</a>.</p> <dl class="py class"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser"> -<code>class urllib.robotparser.RobotFileParser(url='')</code> </dt> <dd> -<p>This class provides methods to read, parse and answer questions about the <code>robots.txt</code> file at <em>url</em>.</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.set_url"> -<code>set_url(url)</code> </dt> <dd> -<p>Sets the URL referring to a <code>robots.txt</code> file.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.read"> -<code>read()</code> </dt> <dd> -<p>Reads the <code>robots.txt</code> URL and feeds it to the parser.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.parse"> -<code>parse(lines)</code> </dt> <dd> -<p>Parses the lines argument.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.can_fetch"> -<code>can_fetch(useragent, url)</code> </dt> <dd> -<p>Returns <code>True</code> if the <em>useragent</em> is allowed to fetch the <em>url</em> according to the rules contained in the parsed <code>robots.txt</code> file.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.mtime"> -<code>mtime()</code> </dt> <dd> -<p>Returns the time the <code>robots.txt</code> file was last fetched. This is useful for long-running web spiders that need to check for new <code>robots.txt</code> files periodically.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.modified"> -<code>modified()</code> </dt> <dd> -<p>Sets the time the <code>robots.txt</code> file was last fetched to the current time.</p> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.crawl_delay"> -<code>crawl_delay(useragent)</code> </dt> <dd> -<p>Returns the value of the <code>Crawl-delay</code> parameter from <code>robots.txt</code> for the <em>useragent</em> in question. If there is no such parameter or it doesn’t apply to the <em>useragent</em> specified or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.6.</span></p> </div> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.request_rate"> -<code>request_rate(useragent)</code> </dt> <dd> -<p>Returns the contents of the <code>Request-rate</code> parameter from <code>robots.txt</code> as a <a class="reference internal" href="../glossary#term-named-tuple"><span class="xref std std-term">named tuple</span></a> <code>RequestRate(requests, seconds)</code>. If there is no such parameter or it doesn’t apply to the <em>useragent</em> specified or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.6.</span></p> </div> </dd> -</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.robotparser.RobotFileParser.site_maps"> -<code>site_maps()</code> </dt> <dd> -<p>Returns the contents of the <code>Sitemap</code> parameter from <code>robots.txt</code> in the form of a <a class="reference internal" href="stdtypes#list" title="list"><code>list()</code></a>. If there is no such parameter or the <code>robots.txt</code> entry for this parameter has invalid syntax, return <code>None</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.8.</span></p> </div> </dd> -</dl> </dd> -</dl> <p>The following example demonstrates basic use of the <a class="reference internal" href="#urllib.robotparser.RobotFileParser" title="urllib.robotparser.RobotFileParser"><code>RobotFileParser</code></a> class:</p> <pre data-language="python">>>> import urllib.robotparser ->>> rp = urllib.robotparser.RobotFileParser() ->>> rp.set_url("http://www.musi-cal.com/robots.txt") ->>> rp.read() ->>> rrate = rp.request_rate("*") ->>> rrate.requests -3 ->>> rrate.seconds -20 ->>> rp.crawl_delay("*") -6 ->>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") -False ->>> rp.can_fetch("*", "http://www.musi-cal.com/") -True -</pre> <div class="_attribution"> - <p class="_attribution-p"> - © 2001–2023 Python Software Foundation<br>Licensed under the PSF License.<br> - <a href="https://docs.python.org/3.12/library/urllib.robotparser.html" class="_attribution-link">https://docs.python.org/3.12/library/urllib.robotparser.html</a> - </p> -</div> |
