diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/python~3.12/library%2Furllib.request.html | |
new repository
Diffstat (limited to 'devdocs/python~3.12/library%2Furllib.request.html')
| -rw-r--r-- | devdocs/python~3.12/library%2Furllib.request.html | 407 |
1 files changed, 407 insertions, 0 deletions
diff --git a/devdocs/python~3.12/library%2Furllib.request.html b/devdocs/python~3.12/library%2Furllib.request.html new file mode 100644 index 00000000..7bcda2d5 --- /dev/null +++ b/devdocs/python~3.12/library%2Furllib.request.html @@ -0,0 +1,407 @@ + <span id="urllib-request-extensible-library-for-opening-urls"></span><h1>urllib.request — Extensible library for opening URLs</h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.12/Lib/urllib/request.py">Lib/urllib/request.py</a></p> <p>The <a class="reference internal" href="#module-urllib.request" title="urllib.request: Extensible library for opening URLs."><code>urllib.request</code></a> module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more.</p> <div class="admonition seealso"> <p class="admonition-title">See also</p> <p>The <a class="reference external" href="https://requests.readthedocs.io/en/master/">Requests package</a> is recommended for a higher-level HTTP client interface.</p> </div> <div class="admonition warning"> <p class="admonition-title">Warning</p> <p>On macOS it is unsafe to use this module in programs using <a class="reference internal" href="os#os.fork" title="os.fork"><code>os.fork()</code></a> because the <a class="reference internal" href="#urllib.request.getproxies" title="urllib.request.getproxies"><code>getproxies()</code></a> implementation for macOS uses a higher-level system API. Set the environment variable <code>no_proxy</code> to <code>*</code> to avoid this problem (e.g. <code>os.environ["no_proxy"] = "*"</code>).</p> </div> <div class="availability docutils container"> <p><a class="reference internal" href="https://docs.python.org/3.12/library/intro.html#availability"><span class="std std-ref">Availability</span></a>: not Emscripten, not WASI.</p> <p>This module does not work or is not available on WebAssembly platforms <code>wasm32-emscripten</code> and <code>wasm32-wasi</code>. See <a class="reference internal" href="https://docs.python.org/3.12/library/intro.html#wasm-availability"><span class="std std-ref">WebAssembly platforms</span></a> for more information.</p> </div> <p>The <a class="reference internal" href="#module-urllib.request" title="urllib.request: Extensible library for opening URLs."><code>urllib.request</code></a> module defines the following functions:</p> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.urlopen"> +<code>urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)</code> </dt> <dd> +<p>Open <em>url</em>, which can be either a string containing a valid, properly encoded URL, or a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object.</p> <p><em>data</em> must be an object specifying additional data to be sent to the server, or <code>None</code> if no such data is needed. See <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> for details.</p> <p>urllib.request module uses HTTP/1.1 and includes <code>Connection:close</code> header in its HTTP requests.</p> <p>The optional <em>timeout</em> parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections.</p> <p>If <em>context</em> is specified, it must be a <a class="reference internal" href="ssl#ssl.SSLContext" title="ssl.SSLContext"><code>ssl.SSLContext</code></a> instance describing the various SSL options. See <a class="reference internal" href="http.client#http.client.HTTPSConnection" title="http.client.HTTPSConnection"><code>HTTPSConnection</code></a> for more details.</p> <p>The optional <em>cafile</em> and <em>capath</em> parameters specify a set of trusted CA certificates for HTTPS requests. <em>cafile</em> should point to a single file containing a bundle of CA certificates, whereas <em>capath</em> should point to a directory of hashed certificate files. More information can be found in <a class="reference internal" href="ssl#ssl.SSLContext.load_verify_locations" title="ssl.SSLContext.load_verify_locations"><code>ssl.SSLContext.load_verify_locations()</code></a>.</p> <p>The <em>cadefault</em> parameter is ignored.</p> <p>This function always returns an object which can work as a <a class="reference internal" href="../glossary#term-context-manager"><span class="xref std std-term">context manager</span></a> and has the properties <em>url</em>, <em>headers</em>, and <em>status</em>. See <a class="reference internal" href="#urllib.response.addinfourl" title="urllib.response.addinfourl"><code>urllib.response.addinfourl</code></a> for more detail on these properties.</p> <p>For HTTP and HTTPS URLs, this function returns a <a class="reference internal" href="http.client#http.client.HTTPResponse" title="http.client.HTTPResponse"><code>http.client.HTTPResponse</code></a> object slightly modified. In addition to the three new methods above, the msg attribute contains the same information as the <a class="reference internal" href="http.client#http.client.HTTPResponse.reason" title="http.client.HTTPResponse.reason"><code>reason</code></a> attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation for <a class="reference internal" href="http.client#http.client.HTTPResponse" title="http.client.HTTPResponse"><code>HTTPResponse</code></a>.</p> <p>For FTP, file, and data URLs and requests explicitly handled by legacy <a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a> and <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a> classes, this function returns a <a class="reference internal" href="#urllib.response.addinfourl" title="urllib.response.addinfourl"><code>urllib.response.addinfourl</code></a> object.</p> <p>Raises <a class="reference internal" href="urllib.error#urllib.error.URLError" title="urllib.error.URLError"><code>URLError</code></a> on protocol errors.</p> <p>Note that <code>None</code> may be returned if no handler handles the request (though the default installed global <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> uses <a class="reference internal" href="#urllib.request.UnknownHandler" title="urllib.request.UnknownHandler"><code>UnknownHandler</code></a> to ensure this never happens).</p> <p>In addition, if proxy settings are detected (for example, when a <code>*_proxy</code> environment variable like <span class="target" id="index-0"></span><code>http_proxy</code> is set), <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> is default installed and makes sure the requests are handled through the proxy.</p> <p>The legacy <code>urllib.urlopen</code> function from Python 2.6 and earlier has been discontinued; <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urllib.request.urlopen()</code></a> corresponds to the old <code>urllib2.urlopen</code>. Proxy handling, which was done by passing a dictionary parameter to <code>urllib.urlopen</code>, can be obtained by using <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> objects.</p> <p class="audit-hook"></p> +<p>The default opener raises an <a class="reference internal" href="sys#auditing"><span class="std std-ref">auditing event</span></a> <code>urllib.Request</code> with arguments <code>fullurl</code>, <code>data</code>, <code>headers</code>, <code>method</code> taken from the request object.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span><em>cafile</em> and <em>capath</em> were added.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span>HTTPS virtual hosts are now supported if possible (that is, if <a class="reference internal" href="ssl#ssl.HAS_SNI" title="ssl.HAS_SNI"><code>ssl.HAS_SNI</code></a> is true).</p> </div> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.2: </span><em>data</em> can be an iterable object.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.3: </span><em>cadefault</em> was added.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4.3: </span><em>context</em> was added.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.10: </span>HTTPS connection now send an ALPN extension with protocol indicator <code>http/1.1</code> when no <em>context</em> is given. Custom <em>context</em> should set ALPN protocols with <code>set_alpn_protocol()</code>.</p> </div> <div class="deprecated"> <p><span class="versionmodified deprecated">Deprecated since version 3.6: </span><em>cafile</em>, <em>capath</em> and <em>cadefault</em> are deprecated in favor of <em>context</em>. Please use <a class="reference internal" href="ssl#ssl.SSLContext.load_cert_chain" title="ssl.SSLContext.load_cert_chain"><code>ssl.SSLContext.load_cert_chain()</code></a> instead, or let <a class="reference internal" href="ssl#ssl.create_default_context" title="ssl.create_default_context"><code>ssl.create_default_context()</code></a> select the system’s trusted CA certificates for you.</p> </div> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.install_opener"> +<code>urllib.request.install_opener(opener)</code> </dt> <dd> +<p>Install an <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call <a class="reference internal" href="#urllib.request.OpenerDirector.open" title="urllib.request.OpenerDirector.open"><code>OpenerDirector.open()</code></a> instead of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>. The code does not check for a real <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>, and any class with the appropriate interface will work.</p> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.build_opener"> +<code>urllib.request.build_opener([handler, ...])</code> </dt> <dd> +<p>Return an <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> instance, which chains the handlers in the order given. <em>handler</em>s can be either instances of <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, or subclasses of <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a> (in which case it must be possible to call the constructor without any parameters). Instances of the following classes will be in front of the <em>handler</em>s, unless the <em>handler</em>s contain them, instances of them or subclasses of them: <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> (if proxy settings are detected), <a class="reference internal" href="#urllib.request.UnknownHandler" title="urllib.request.UnknownHandler"><code>UnknownHandler</code></a>, <a class="reference internal" href="#urllib.request.HTTPHandler" title="urllib.request.HTTPHandler"><code>HTTPHandler</code></a>, <a class="reference internal" href="#urllib.request.HTTPDefaultErrorHandler" title="urllib.request.HTTPDefaultErrorHandler"><code>HTTPDefaultErrorHandler</code></a>, <a class="reference internal" href="#urllib.request.HTTPRedirectHandler" title="urllib.request.HTTPRedirectHandler"><code>HTTPRedirectHandler</code></a>, <a class="reference internal" href="#urllib.request.FTPHandler" title="urllib.request.FTPHandler"><code>FTPHandler</code></a>, <a class="reference internal" href="#urllib.request.FileHandler" title="urllib.request.FileHandler"><code>FileHandler</code></a>, <a class="reference internal" href="#urllib.request.HTTPErrorProcessor" title="urllib.request.HTTPErrorProcessor"><code>HTTPErrorProcessor</code></a>.</p> <p>If the Python installation has SSL support (i.e., if the <a class="reference internal" href="ssl#module-ssl" title="ssl: TLS/SSL wrapper for socket objects"><code>ssl</code></a> module can be imported), <a class="reference internal" href="#urllib.request.HTTPSHandler" title="urllib.request.HTTPSHandler"><code>HTTPSHandler</code></a> will also be added.</p> <p>A <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a> subclass may also change its <code>handler_order</code> attribute to modify its position in the handlers list.</p> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.pathname2url"> +<code>urllib.request.pathname2url(path)</code> </dt> <dd> +<p>Convert the pathname <em>path</em> from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the <a class="reference internal" href="urllib.parse#urllib.parse.quote" title="urllib.parse.quote"><code>quote()</code></a> function.</p> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.url2pathname"> +<code>urllib.request.url2pathname(path)</code> </dt> <dd> +<p>Convert the path component <em>path</em> from a percent-encoded URL to the local syntax for a path. This does not accept a complete URL. This function uses <a class="reference internal" href="urllib.parse#urllib.parse.unquote" title="urllib.parse.unquote"><code>unquote()</code></a> to decode <em>path</em>.</p> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.getproxies"> +<code>urllib.request.getproxies()</code> </dt> <dd> +<p>This helper function returns a dictionary of scheme to proxy server URL mappings. It scans the environment for variables named <code><scheme>_proxy</code>, in a case insensitive approach, for all operating systems first, and when it cannot find it, looks for proxy information from System Configuration for macOS and Windows Systems Registry for Windows. If both lowercase and uppercase environment variables exist (and disagree), lowercase is preferred.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>If the environment variable <code>REQUEST_METHOD</code> is set, which usually indicates your script is running in a CGI environment, the environment variable <code>HTTP_PROXY</code> (uppercase <code>_PROXY</code>) will be ignored. This is because that variable can be injected by a client using the “Proxy:” HTTP header. If you need to use an HTTP proxy in a CGI environment, either use <code>ProxyHandler</code> explicitly, or make sure the variable name is in lowercase (or at least the <code>_proxy</code> suffix).</p> </div> </dd> +</dl> <p>The following classes are provided:</p> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.Request"> +<code>class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)</code> </dt> <dd> +<p>This class is an abstraction of a URL request.</p> <p><em>url</em> should be a string containing a valid, properly encoded URL.</p> <p><em>data</em> must be an object specifying additional data to send to the server, or <code>None</code> if no such data is needed. Currently HTTP requests are the only ones that use <em>data</em>. The supported object types include bytes, file-like objects, and iterables of bytes-like objects. If no <code>Content-Length</code> nor <code>Transfer-Encoding</code> header field has been provided, <a class="reference internal" href="#urllib.request.HTTPHandler" title="urllib.request.HTTPHandler"><code>HTTPHandler</code></a> will set these headers according to the type of <em>data</em>. <code>Content-Length</code> will be used to send bytes objects, while <code>Transfer-Encoding: chunked</code> as specified in <span class="target" id="index-1"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc7230.html"><strong>RFC 7230</strong></a>, Section 3.3.1 will be used to send files and other iterables.</p> <p>For an HTTP POST request method, <em>data</em> should be a buffer in the standard <em class="mimetype">application/x-www-form-urlencoded</em> format. The <a class="reference internal" href="urllib.parse#urllib.parse.urlencode" title="urllib.parse.urlencode"><code>urllib.parse.urlencode()</code></a> function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format. It should be encoded to bytes before being used as the <em>data</em> parameter.</p> <p><em>headers</em> should be a dictionary, and will be treated as if <a class="reference internal" href="#urllib.request.Request.add_header" title="urllib.request.Request.add_header"><code>add_header()</code></a> was called with each key and value as arguments. This is often used to “spoof” the <code>User-Agent</code> header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as <code>"Mozilla/5.0 +(X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"</code>, while <a class="reference internal" href="urllib#module-urllib" title="urllib"><code>urllib</code></a>’s default user agent string is <code>"Python-urllib/2.6"</code> (on Python 2.6). All header keys are sent in camel case.</p> <p>An appropriate <code>Content-Type</code> header should be included if the <em>data</em> argument is present. If this header has not been provided and <em>data</em> is not None, <code>Content-Type: application/x-www-form-urlencoded</code> will be added as a default.</p> <p>The next two arguments are only of interest for correct handling of third-party HTTP cookies:</p> <p><em>origin_req_host</em> should be the request-host of the origin transaction, as defined by <span class="target" id="index-2"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2965.html"><strong>RFC 2965</strong></a>. It defaults to <code>http.cookiejar.request_host(self)</code>. This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image.</p> <p><em>unverifiable</em> should indicate whether the request is unverifiable, as defined by <span class="target" id="index-3"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2965.html"><strong>RFC 2965</strong></a>. It defaults to <code>False</code>. An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true.</p> <p><em>method</em> should be a string that indicates the HTTP request method that will be used (e.g. <code>'HEAD'</code>). If provided, its value is stored in the <a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>method</code></a> attribute and is used by <a class="reference internal" href="#urllib.request.Request.get_method" title="urllib.request.Request.get_method"><code>get_method()</code></a>. The default is <code>'GET'</code> if <em>data</em> is <code>None</code> or <code>'POST'</code> otherwise. Subclasses may indicate a different default method by setting the <a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>method</code></a> attribute in the class itself.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>The request will not work as expected if the data object is unable to deliver its content more than once (e.g. a file or an iterable that can produce the content only once) and the request is retried for HTTP redirects or authentication. The <em>data</em> is sent to the HTTP server right away after the headers. There is no support for a 100-continue expectation in the library.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.3: </span><a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>Request.method</code></a> argument is added to the Request class.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4: </span>Default <a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>Request.method</code></a> may be indicated at the class level.</p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.6: </span>Do not raise an error if the <code>Content-Length</code> has not been provided and <em>data</em> is neither <code>None</code> nor a bytes object. Fall back to use chunked transfer encoding instead.</p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.OpenerDirector"> +<code>class urllib.request.OpenerDirector</code> </dt> <dd> +<p>The <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> class opens URLs via <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>s chained together. It manages the chaining of handlers, and recovery from errors.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.BaseHandler"> +<code>class urllib.request.BaseHandler</code> </dt> <dd> +<p>This is the base class for all registered handlers — and handles only the simple mechanics of registration.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPDefaultErrorHandler"> +<code>class urllib.request.HTTPDefaultErrorHandler</code> </dt> <dd> +<p>A class which defines a default handler for HTTP error responses; all responses are turned into <a class="reference internal" href="urllib.error#urllib.error.HTTPError" title="urllib.error.HTTPError"><code>HTTPError</code></a> exceptions.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler"> +<code>class urllib.request.HTTPRedirectHandler</code> </dt> <dd> +<p>A class to handle redirections.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPCookieProcessor"> +<code>class urllib.request.HTTPCookieProcessor(cookiejar=None)</code> </dt> <dd> +<p>A class to handle HTTP Cookies.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.ProxyHandler"> +<code>class urllib.request.ProxyHandler(proxies=None)</code> </dt> <dd> +<p>Cause requests to go through a proxy. If <em>proxies</em> is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables <code><protocol>_proxy</code>. If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework.</p> <p>To disable autodetected proxy pass an empty dictionary.</p> <p>The <span class="target" id="index-4"></span><code>no_proxy</code> environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma-separated list of hostname suffixes, optionally with <code>:port</code> appended, for example <code>cern.ch,ncsa.uiuc.edu,some.host:8080</code>.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p><code>HTTP_PROXY</code> will be ignored if a variable <code>REQUEST_METHOD</code> is set; see the documentation on <a class="reference internal" href="#urllib.request.getproxies" title="urllib.request.getproxies"><code>getproxies()</code></a>.</p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgr"> +<code>class urllib.request.HTTPPasswordMgr</code> </dt> <dd> +<p>Keep a database of <code>(realm, uri) -> (user, password)</code> mappings.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithDefaultRealm"> +<code>class urllib.request.HTTPPasswordMgrWithDefaultRealm</code> </dt> <dd> +<p>Keep a database of <code>(realm, uri) -> (user, password)</code> mappings. A realm of <code>None</code> is considered a catch-all realm, which is searched if no other realm fits.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithPriorAuth"> +<code>class urllib.request.HTTPPasswordMgrWithPriorAuth</code> </dt> <dd> +<p>A variant of <a class="reference internal" href="#urllib.request.HTTPPasswordMgrWithDefaultRealm" title="urllib.request.HTTPPasswordMgrWithDefaultRealm"><code>HTTPPasswordMgrWithDefaultRealm</code></a> that also has a database of <code>uri -> is_authenticated</code> mappings. Can be used by a BasicAuth handler to determine when to send authentication credentials immediately instead of waiting for a <code>401</code> response first.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.5.</span></p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.AbstractBasicAuthHandler"> +<code>class urllib.request.AbstractBasicAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported. If <em>passwd_mgr</em> also provides <code>is_authenticated</code> and <code>update_authenticated</code> methods (see <a class="reference internal" href="#http-password-mgr-with-prior-auth"><span class="std std-ref">HTTPPasswordMgrWithPriorAuth Objects</span></a>), then the handler will use the <code>is_authenticated</code> result for a given URI to determine whether or not to send authentication credentials with the request. If <code>is_authenticated</code> returns <code>True</code> for the URI, credentials are sent. If <code>is_authenticated</code> is <code>False</code>, credentials are not sent, and then if a <code>401</code> response is received the request is re-sent with the authentication credentials. If authentication succeeds, <code>update_authenticated</code> is called to set <code>is_authenticated</code> <code>True</code> for the URI, so that subsequent requests to the URI or any of its super-URIs will automatically include the authentication credentials.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.5: </span>Added <code>is_authenticated</code> support.</p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPBasicAuthHandler"> +<code>class urllib.request.HTTPBasicAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>Handle authentication with the remote host. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported. HTTPBasicAuthHandler will raise a <a class="reference internal" href="exceptions#ValueError" title="ValueError"><code>ValueError</code></a> when presented with a wrong Authentication scheme.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.ProxyBasicAuthHandler"> +<code>class urllib.request.ProxyBasicAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>Handle authentication with the proxy. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.AbstractDigestAuthHandler"> +<code>class urllib.request.AbstractDigestAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPDigestAuthHandler"> +<code>class urllib.request.HTTPDigestAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>Handle authentication with the remote host. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported. When both Digest Authentication Handler and Basic Authentication Handler are both added, Digest Authentication is always tried first. If the Digest Authentication returns a 40x response again, it is sent to Basic Authentication handler to Handle. This Handler method will raise a <a class="reference internal" href="exceptions#ValueError" title="ValueError"><code>ValueError</code></a> when presented with an authentication scheme other than Digest or Basic.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.3: </span>Raise <a class="reference internal" href="exceptions#ValueError" title="ValueError"><code>ValueError</code></a> on unsupported Authentication Scheme.</p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.ProxyDigestAuthHandler"> +<code>class urllib.request.ProxyDigestAuthHandler(password_mgr=None)</code> </dt> <dd> +<p>Handle authentication with the proxy. <em>password_mgr</em>, if given, should be something that is compatible with <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a>; refer to section <a class="reference internal" href="#http-password-mgr"><span class="std std-ref">HTTPPasswordMgr Objects</span></a> for information on the interface that must be supported.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPHandler"> +<code>class urllib.request.HTTPHandler</code> </dt> <dd> +<p>A class to handle opening of HTTP URLs.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPSHandler"> +<code>class urllib.request.HTTPSHandler(debuglevel=0, context=None, check_hostname=None)</code> </dt> <dd> +<p>A class to handle opening of HTTPS URLs. <em>context</em> and <em>check_hostname</em> have the same meaning as in <a class="reference internal" href="http.client#http.client.HTTPSConnection" title="http.client.HTTPSConnection"><code>http.client.HTTPSConnection</code></a>.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span><em>context</em> and <em>check_hostname</em> were added.</p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.FileHandler"> +<code>class urllib.request.FileHandler</code> </dt> <dd> +<p>Open local files.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.DataHandler"> +<code>class urllib.request.DataHandler</code> </dt> <dd> +<p>Open data URLs.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.4.</span></p> </div> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.FTPHandler"> +<code>class urllib.request.FTPHandler</code> </dt> <dd> +<p>Open FTP URLs.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.CacheFTPHandler"> +<code>class urllib.request.CacheFTPHandler</code> </dt> <dd> +<p>Open FTP URLs, keeping a cache of open FTP connections to minimize delays.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.UnknownHandler"> +<code>class urllib.request.UnknownHandler</code> </dt> <dd> +<p>A catch-all class to handle unknown URLs.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.HTTPErrorProcessor"> +<code>class urllib.request.HTTPErrorProcessor</code> </dt> <dd> +<p>Process HTTP error responses.</p> </dd> +</dl> <section id="request-objects"> <span id="id1"></span><h2>Request Objects</h2> <p>The following methods describe <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a>’s public interface, and so all may be overridden in subclasses. It also defines several public attributes that can be used by clients to inspect the parsed request.</p> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.full_url"> +<code>Request.full_url</code> </dt> <dd> +<p>The original URL passed to the constructor.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4.</span></p> </div> <p>Request.full_url is a property with setter, getter and a deleter. Getting <a class="reference internal" href="#urllib.request.Request.full_url" title="urllib.request.Request.full_url"><code>full_url</code></a> returns the original request URL with the fragment, if it was present.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.type"> +<code>Request.type</code> </dt> <dd> +<p>The URI scheme.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.host"> +<code>Request.host</code> </dt> <dd> +<p>The URI authority, typically a host, but may also contain a port separated by a colon.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.origin_req_host"> +<code>Request.origin_req_host</code> </dt> <dd> +<p>The original host for the request, without port.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.selector"> +<code>Request.selector</code> </dt> <dd> +<p>The URI path. If the <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> uses a proxy, then selector will be the full URL that is passed to the proxy.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.data"> +<code>Request.data</code> </dt> <dd> +<p>The entity body for the request, or <code>None</code> if not specified.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4: </span>Changing value of <a class="reference internal" href="#urllib.request.Request.data" title="urllib.request.Request.data"><code>Request.data</code></a> now deletes “Content-Length” header if it was previously set or calculated.</p> </div> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.unverifiable"> +<code>Request.unverifiable</code> </dt> <dd> +<p>boolean, indicates whether the request is unverifiable as defined by <span class="target" id="index-5"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2965.html"><strong>RFC 2965</strong></a>.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.Request.method"> +<code>Request.method</code> </dt> <dd> +<p>The HTTP request method to use. By default its value is <a class="reference internal" href="constants#None" title="None"><code>None</code></a>, which means that <a class="reference internal" href="#urllib.request.Request.get_method" title="urllib.request.Request.get_method"><code>get_method()</code></a> will do its normal computation of the method to be used. Its value can be set (thus overriding the default computation in <a class="reference internal" href="#urllib.request.Request.get_method" title="urllib.request.Request.get_method"><code>get_method()</code></a>) either by providing a default value by setting it at the class level in a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> subclass, or by passing a value in to the <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> constructor via the <em>method</em> argument.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.3.</span></p> </div> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4: </span>A default value can now be set in subclasses; previously it could only be set via the constructor argument.</p> </div> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.get_method"> +<code>Request.get_method()</code> </dt> <dd> +<p>Return a string indicating the HTTP request method. If <a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>Request.method</code></a> is not <code>None</code>, return its value, otherwise return <code>'GET'</code> if <a class="reference internal" href="#urllib.request.Request.data" title="urllib.request.Request.data"><code>Request.data</code></a> is <code>None</code>, or <code>'POST'</code> if it’s not. This is only meaningful for HTTP requests.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.3: </span>get_method now looks at the value of <a class="reference internal" href="#urllib.request.Request.method" title="urllib.request.Request.method"><code>Request.method</code></a>.</p> </div> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.add_header"> +<code>Request.add_header(key, val)</code> </dt> <dd> +<p>Add another header to the request. Headers are currently ignored by all handlers except HTTP handlers, where they are added to the list of headers sent to the server. Note that there cannot be more than one header with the same name, and later calls will overwrite previous calls in case the <em>key</em> collides. Currently, this is no loss of HTTP functionality, since all headers which have meaning when used more than once have a (header-specific) way of gaining the same functionality using only one header. Note that headers added using this method are also added to redirected requests.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.add_unredirected_header"> +<code>Request.add_unredirected_header(key, header)</code> </dt> <dd> +<p>Add a header that will not be added to a redirected request.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.has_header"> +<code>Request.has_header(header)</code> </dt> <dd> +<p>Return whether the instance has the named header (checks both regular and unredirected).</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.remove_header"> +<code>Request.remove_header(header)</code> </dt> <dd> +<p>Remove named header from the request instance (both from regular and unredirected headers).</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.4.</span></p> </div> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.get_full_url"> +<code>Request.get_full_url()</code> </dt> <dd> +<p>Return the URL given in the constructor.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4.</span></p> </div> <p>Returns <a class="reference internal" href="#urllib.request.Request.full_url" title="urllib.request.Request.full_url"><code>Request.full_url</code></a></p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.set_proxy"> +<code>Request.set_proxy(host, type)</code> </dt> <dd> +<p>Prepare the request by connecting to a proxy server. The <em>host</em> and <em>type</em> will replace those of the instance, and the instance’s selector will be the original URL given in the constructor.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.get_header"> +<code>Request.get_header(header_name, default=None)</code> </dt> <dd> +<p>Return the value of the given header. If the header is not present, return the default value.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.Request.header_items"> +<code>Request.header_items()</code> </dt> <dd> +<p>Return a list of tuples (header_name, header_value) of the Request headers.</p> </dd> +</dl> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4: </span>The request methods add_data, has_data, get_data, get_type, get_host, get_selector, get_origin_req_host and is_unverifiable that were deprecated since 3.3 have been removed.</p> </div> </section> <section id="openerdirector-objects"> <span id="opener-director-objects"></span><h2>OpenerDirector Objects</h2> <p><a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> instances have the following methods:</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.OpenerDirector.add_handler"> +<code>OpenerDirector.add_handler(handler)</code> </dt> <dd> +<p><em>handler</em> should be an instance of <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>. The following methods are searched, and added to the possible chains (note that HTTP errors are a special case). Note that, in the following, <em>protocol</em> should be replaced with the actual protocol to handle, for example <code>http_response()</code> would be the HTTP protocol response handler. Also <em>type</em> should be replaced with the actual HTTP code, for example <code>http_error_404()</code> would handle HTTP 404 errors.</p> <ul> <li> +<p><code><protocol>_open()</code> — signal that the handler knows how to open <em>protocol</em> URLs.</p> <p>See <a class="reference internal" href="#protocol-open"><code>BaseHandler.<protocol>_open()</code></a> for more information.</p> </li> <li> +<p><code>http_error_<type>()</code> — signal that the handler knows how to handle HTTP errors with HTTP error code <em>type</em>.</p> <p>See <a class="reference internal" href="#http-error-nnn"><code>BaseHandler.http_error_<nnn>()</code></a> for more information.</p> </li> <li> +<code><protocol>_error()</code> — signal that the handler knows how to handle errors from (non-<code>http</code>) <em>protocol</em>.</li> <li> +<p><code><protocol>_request()</code> — signal that the handler knows how to pre-process <em>protocol</em> requests.</p> <p>See <a class="reference internal" href="#protocol-request"><code>BaseHandler.<protocol>_request()</code></a> for more information.</p> </li> <li> +<p><code><protocol>_response()</code> — signal that the handler knows how to post-process <em>protocol</em> responses.</p> <p>See <a class="reference internal" href="#protocol-response"><code>BaseHandler.<protocol>_response()</code></a> for more information.</p> </li> </ul> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.OpenerDirector.open"> +<code>OpenerDirector.open(url, data=None[, timeout])</code> </dt> <dd> +<p>Open the given <em>url</em> (which can be a request object or a string), optionally passing the given <em>data</em>. Arguments, return values and exceptions raised are the same as those of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a> (which simply calls the <a class="reference internal" href="functions#open" title="open"><code>open()</code></a> method on the currently installed global <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>). The optional <em>timeout</em> parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). The timeout feature actually works only for HTTP, HTTPS and FTP connections.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.OpenerDirector.error"> +<code>OpenerDirector.error(proto, *args)</code> </dt> <dd> +<p>Handle an error of the given protocol. This will call the registered error handlers for the given protocol with the given arguments (which are protocol specific). The HTTP protocol is a special case which uses the HTTP response code to determine the specific error handler; refer to the <code>http_error_<type>()</code> methods of the handler classes.</p> <p>Return values and exceptions raised are the same as those of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>.</p> </dd> +</dl> <p>OpenerDirector objects open URLs in three stages:</p> <p>The order in which these methods are called within each stage is determined by sorting the handler instances.</p> <ol class="arabic"> <li>Every handler with a method named like <code><protocol>_request()</code> has that method called to pre-process the request.</li> <li> +<p>Handlers with a method named like <code><protocol>_open()</code> are called to handle the request. This stage ends when a handler either returns a non-<a class="reference internal" href="constants#None" title="None"><code>None</code></a> value (ie. a response), or raises an exception (usually <a class="reference internal" href="urllib.error#urllib.error.URLError" title="urllib.error.URLError"><code>URLError</code></a>). Exceptions are allowed to propagate.</p> <p>In fact, the above algorithm is first tried for methods named <code>default_open()</code>. If all such methods return <a class="reference internal" href="constants#None" title="None"><code>None</code></a>, the algorithm is repeated for methods named like <code><protocol>_open()</code>. If all such methods return <a class="reference internal" href="constants#None" title="None"><code>None</code></a>, the algorithm is repeated for methods named <code>unknown_open()</code>.</p> <p>Note that the implementation of these methods may involve calls of the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> instance’s <a class="reference internal" href="#urllib.request.OpenerDirector.open" title="urllib.request.OpenerDirector.open"><code>open()</code></a> and <a class="reference internal" href="#urllib.request.OpenerDirector.error" title="urllib.request.OpenerDirector.error"><code>error()</code></a> methods.</p> </li> <li>Every handler with a method named like <code><protocol>_response()</code> has that method called to post-process the response.</li> </ol> </section> <section id="basehandler-objects"> <span id="base-handler-objects"></span><h2>BaseHandler Objects</h2> <p><a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a> objects provide a couple of methods that are directly useful, and others that are meant to be used by derived classes. These are intended for direct use:</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.add_parent"> +<code>BaseHandler.add_parent(director)</code> </dt> <dd> +<p>Add a director as parent.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.close"> +<code>BaseHandler.close()</code> </dt> <dd> +<p>Remove any parents.</p> </dd> +</dl> <p>The following attribute and methods should only be used by classes derived from <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>The convention has been adopted that subclasses defining <code><protocol>_request()</code> or <code><protocol>_response()</code> methods are named <code>*Processor</code>; all others are named <code>*Handler</code>.</p> </div> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.parent"> +<code>BaseHandler.parent</code> </dt> <dd> +<p>A valid <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>, which can be used to open using a different protocol, or handle errors.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.default_open"> +<code>BaseHandler.default_open(req)</code> </dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should define it if they want to catch all URLs.</p> <p>This method, if implemented, will be called by the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>. It should return a file-like object as described in the return value of the <a class="reference internal" href="#urllib.request.OpenerDirector.open" title="urllib.request.OpenerDirector.open"><code>open()</code></a> method of <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>, or <code>None</code>. It should raise <a class="reference internal" href="urllib.error#urllib.error.URLError" title="urllib.error.URLError"><code>URLError</code></a>, unless a truly exceptional thing happens (for example, <a class="reference internal" href="exceptions#MemoryError" title="MemoryError"><code>MemoryError</code></a> should not be mapped to <code>URLError</code>).</p> <p>This method will be called before any protocol-specific open method.</p> </dd> +</dl> <span class="target" id="protocol-open"></span><dl class="py method"> <dt class="sig sig-object py"> <span class="sig-name descname">BaseHandler.<protocol>_open(req)</span> +</dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should define it if they want to handle URLs with the given protocol.</p> <p>This method, if defined, will be called by the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>. Return values should be the same as for <code>default_open()</code>.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.unknown_open"> +<code>BaseHandler.unknown_open(req)</code> </dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should define it if they want to catch all URLs with no specific registered handler to open it.</p> <p>This method, if implemented, will be called by the <a class="reference internal" href="#urllib.request.BaseHandler.parent" title="urllib.request.BaseHandler.parent"><code>parent</code></a> <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>. Return values should be the same as for <a class="reference internal" href="#urllib.request.BaseHandler.default_open" title="urllib.request.BaseHandler.default_open"><code>default_open()</code></a>.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.BaseHandler.http_error_default"> +<code>BaseHandler.http_error_default(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should override it if they intend to provide a catch-all for otherwise unhandled HTTP errors. It will be called automatically by the <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> getting the error, and should not normally be called in other circumstances.</p> <p><em>req</em> will be a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object, <em>fp</em> will be a file-like object with the HTTP error body, <em>code</em> will be the three-digit code of the error, <em>msg</em> will be the user-visible explanation of the code and <em>hdrs</em> will be a mapping object with the headers of the error.</p> <p>Return values and exceptions raised should be the same as those of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>.</p> </dd> +</dl> <span class="target" id="http-error-nnn"></span><dl class="py method"> <dt class="sig sig-object py"> <span class="sig-name descname">BaseHandler.http_error_<nnn>(req, fp, code, msg, hdrs)</span> +</dt> <dd> +<p><em>nnn</em> should be a three-digit HTTP error code. This method is also not defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but will be called, if it exists, on an instance of a subclass, when an HTTP error with code <em>nnn</em> occurs.</p> <p>Subclasses should override this method to handle specific HTTP errors.</p> <p>Arguments, return values and exceptions raised should be the same as for <code>http_error_default()</code>.</p> </dd> +</dl> <span class="target" id="protocol-request"></span><dl class="py method"> <dt class="sig sig-object py"> <span class="sig-name descname">BaseHandler.<protocol>_request(req)</span> +</dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should define it if they want to pre-process requests of the given protocol.</p> <p>This method, if defined, will be called by the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>. <em>req</em> will be a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object. The return value should be a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object.</p> </dd> +</dl> <span class="target" id="protocol-response"></span><dl class="py method"> <dt class="sig sig-object py"> <span class="sig-name descname">BaseHandler.<protocol>_response(req, response)</span> +</dt> <dd> +<p>This method is <em>not</em> defined in <a class="reference internal" href="#urllib.request.BaseHandler" title="urllib.request.BaseHandler"><code>BaseHandler</code></a>, but subclasses should define it if they want to post-process responses of the given protocol.</p> <p>This method, if defined, will be called by the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a>. <em>req</em> will be a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object. <em>response</em> will be an object implementing the same interface as the return value of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>. The return value should implement the same interface as the return value of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>.</p> </dd> +</dl> </section> <section id="httpredirecthandler-objects"> <span id="http-redirect-handler"></span><h2>HTTPRedirectHandler Objects</h2> <div class="admonition note"> <p class="admonition-title">Note</p> <p>Some HTTP redirections require action from this module’s client code. If this is the case, <a class="reference internal" href="urllib.error#urllib.error.HTTPError" title="urllib.error.HTTPError"><code>HTTPError</code></a> is raised. See <span class="target" id="index-6"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2616.html"><strong>RFC 2616</strong></a> for details of the precise meanings of the various redirection codes.</p> <p>An <code>HTTPError</code> exception raised as a security consideration if the HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, HTTPS or FTP URL.</p> </div> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.redirect_request"> +<code>HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl)</code> </dt> <dd> +<p>Return a <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> or <code>None</code> in response to a redirect. This is called by the default implementations of the <code>http_error_30*()</code> methods when a redirection is received from the server. If a redirection should take place, return a new <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> to allow <code>http_error_30*()</code> to perform the redirect to <em>newurl</em>. Otherwise, raise <a class="reference internal" href="urllib.error#urllib.error.HTTPError" title="urllib.error.HTTPError"><code>HTTPError</code></a> if no other handler should try to handle this URL, or return <code>None</code> if you can’t but another handler might.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>The default implementation of this method does not strictly follow <span class="target" id="index-7"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2616.html"><strong>RFC 2616</strong></a>, which says that 301 and 302 responses to <code>POST</code> requests must not be automatically redirected without confirmation by the user. In reality, browsers do allow automatic redirection of these responses, changing the POST to a <code>GET</code>, and the default implementation reproduces this behavior.</p> </div> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.http_error_301"> +<code>HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>Redirect to the <code>Location:</code> or <code>URI:</code> URL. This method is called by the parent <a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> when getting an HTTP ‘moved permanently’ response.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.http_error_302"> +<code>HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>The same as <a class="reference internal" href="#urllib.request.HTTPRedirectHandler.http_error_301" title="urllib.request.HTTPRedirectHandler.http_error_301"><code>http_error_301()</code></a>, but called for the ‘found’ response.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.http_error_303"> +<code>HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>The same as <a class="reference internal" href="#urllib.request.HTTPRedirectHandler.http_error_301" title="urllib.request.HTTPRedirectHandler.http_error_301"><code>http_error_301()</code></a>, but called for the ‘see other’ response.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.http_error_307"> +<code>HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>The same as <a class="reference internal" href="#urllib.request.HTTPRedirectHandler.http_error_301" title="urllib.request.HTTPRedirectHandler.http_error_301"><code>http_error_301()</code></a>, but called for the ‘temporary redirect’ response. It does not allow changing the request method from <code>POST</code> to <code>GET</code>.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPRedirectHandler.http_error_308"> +<code>HTTPRedirectHandler.http_error_308(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>The same as <a class="reference internal" href="#urllib.request.HTTPRedirectHandler.http_error_301" title="urllib.request.HTTPRedirectHandler.http_error_301"><code>http_error_301()</code></a>, but called for the ‘permanent redirect’ response. It does not allow changing the request method from <code>POST</code> to <code>GET</code>.</p> <div class="versionadded"> <p><span class="versionmodified added">New in version 3.11.</span></p> </div> </dd> +</dl> </section> <section id="httpcookieprocessor-objects"> <span id="http-cookie-processor"></span><h2>HTTPCookieProcessor Objects</h2> <p><a class="reference internal" href="#urllib.request.HTTPCookieProcessor" title="urllib.request.HTTPCookieProcessor"><code>HTTPCookieProcessor</code></a> instances have one attribute:</p> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.HTTPCookieProcessor.cookiejar"> +<code>HTTPCookieProcessor.cookiejar</code> </dt> <dd> +<p>The <a class="reference internal" href="http.cookiejar#http.cookiejar.CookieJar" title="http.cookiejar.CookieJar"><code>http.cookiejar.CookieJar</code></a> in which cookies are stored.</p> </dd> +</dl> </section> <section id="proxyhandler-objects"> <span id="proxy-handler"></span><h2>ProxyHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py"> <span class="sig-name descname">ProxyHandler.<protocol>_open(request)</span> +</dt> <dd> +<p>The <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> will have a method <code><protocol>_open()</code> for every <em>protocol</em> which has a proxy in the <em>proxies</em> dictionary given in the constructor. The method will modify requests to go through the proxy, by calling <code>request.set_proxy()</code>, and call the next handler in the chain to actually execute the protocol.</p> </dd> +</dl> </section> <section id="httppasswordmgr-objects"> <span id="http-password-mgr"></span><h2>HTTPPasswordMgr Objects</h2> <p>These methods are available on <a class="reference internal" href="#urllib.request.HTTPPasswordMgr" title="urllib.request.HTTPPasswordMgr"><code>HTTPPasswordMgr</code></a> and <a class="reference internal" href="#urllib.request.HTTPPasswordMgrWithDefaultRealm" title="urllib.request.HTTPPasswordMgrWithDefaultRealm"><code>HTTPPasswordMgrWithDefaultRealm</code></a> objects.</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgr.add_password"> +<code>HTTPPasswordMgr.add_password(realm, uri, user, passwd)</code> </dt> <dd> +<p><em>uri</em> can be either a single URI, or a sequence of URIs. <em>realm</em>, <em>user</em> and <em>passwd</em> must be strings. This causes <code>(user, passwd)</code> to be used as authentication tokens when authentication for <em>realm</em> and a super-URI of any of the given URIs is given.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgr.find_user_password"> +<code>HTTPPasswordMgr.find_user_password(realm, authuri)</code> </dt> <dd> +<p>Get user/password for given realm and URI, if any. This method will return <code>(None, None)</code> if there is no matching user/password.</p> <p>For <a class="reference internal" href="#urllib.request.HTTPPasswordMgrWithDefaultRealm" title="urllib.request.HTTPPasswordMgrWithDefaultRealm"><code>HTTPPasswordMgrWithDefaultRealm</code></a> objects, the realm <code>None</code> will be searched if the given <em>realm</em> has no matching user/password.</p> </dd> +</dl> </section> <section id="httppasswordmgrwithpriorauth-objects"> <span id="http-password-mgr-with-prior-auth"></span><h2>HTTPPasswordMgrWithPriorAuth Objects</h2> <p>This password manager extends <a class="reference internal" href="#urllib.request.HTTPPasswordMgrWithDefaultRealm" title="urllib.request.HTTPPasswordMgrWithDefaultRealm"><code>HTTPPasswordMgrWithDefaultRealm</code></a> to support tracking URIs for which authentication credentials should always be sent.</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithPriorAuth.add_password"> +<code>HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, passwd, is_authenticated=False)</code> </dt> <dd> +<p><em>realm</em>, <em>uri</em>, <em>user</em>, <em>passwd</em> are as for <a class="reference internal" href="#urllib.request.HTTPPasswordMgr.add_password" title="urllib.request.HTTPPasswordMgr.add_password"><code>HTTPPasswordMgr.add_password()</code></a>. <em>is_authenticated</em> sets the initial value of the <code>is_authenticated</code> flag for the given URI or list of URIs. If <em>is_authenticated</em> is specified as <code>True</code>, <em>realm</em> is ignored.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithPriorAuth.find_user_password"> +<code>HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri)</code> </dt> <dd> +<p>Same as for <a class="reference internal" href="#urllib.request.HTTPPasswordMgrWithDefaultRealm" title="urllib.request.HTTPPasswordMgrWithDefaultRealm"><code>HTTPPasswordMgrWithDefaultRealm</code></a> objects</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithPriorAuth.update_authenticated"> +<code>HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, is_authenticated=False)</code> </dt> <dd> +<p>Update the <code>is_authenticated</code> flag for the given <em>uri</em> or list of URIs.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPPasswordMgrWithPriorAuth.is_authenticated"> +<code>HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri)</code> </dt> <dd> +<p>Returns the current state of the <code>is_authenticated</code> flag for the given URI.</p> </dd> +</dl> </section> <section id="abstractbasicauthhandler-objects"> <span id="abstract-basic-auth-handler"></span><h2>AbstractBasicAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.AbstractBasicAuthHandler.http_error_auth_reqed"> +<code>AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers)</code> </dt> <dd> +<p>Handle an authentication request by getting a user/password pair, and re-trying the request. <em>authreq</em> should be the name of the header where the information about the realm is included in the request, <em>host</em> specifies the URL and path to authenticate for, <em>req</em> should be the (failed) <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object, and <em>headers</em> should be the error headers.</p> <p><em>host</em> is either an authority (e.g. <code>"python.org"</code>) or a URL containing an authority component (e.g. <code>"http://python.org/"</code>). In either case, the authority must not contain a userinfo component (so, <code>"python.org"</code> and <code>"python.org:80"</code> are fine, <code>"joe:password@python.org"</code> is not).</p> </dd> +</dl> </section> <section id="httpbasicauthhandler-objects"> <span id="http-basic-auth-handler"></span><h2>HTTPBasicAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPBasicAuthHandler.http_error_401"> +<code>HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>Retry the request with authentication information, if available.</p> </dd> +</dl> </section> <section id="proxybasicauthhandler-objects"> <span id="proxy-basic-auth-handler"></span><h2>ProxyBasicAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.ProxyBasicAuthHandler.http_error_407"> +<code>ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>Retry the request with authentication information, if available.</p> </dd> +</dl> </section> <section id="abstractdigestauthhandler-objects"> <span id="abstract-digest-auth-handler"></span><h2>AbstractDigestAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.AbstractDigestAuthHandler.http_error_auth_reqed"> +<code>AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers)</code> </dt> <dd> +<p><em>authreq</em> should be the name of the header where the information about the realm is included in the request, <em>host</em> should be the host to authenticate to, <em>req</em> should be the (failed) <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> object, and <em>headers</em> should be the error headers.</p> </dd> +</dl> </section> <section id="httpdigestauthhandler-objects"> <span id="http-digest-auth-handler"></span><h2>HTTPDigestAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPDigestAuthHandler.http_error_401"> +<code>HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>Retry the request with authentication information, if available.</p> </dd> +</dl> </section> <section id="proxydigestauthhandler-objects"> <span id="proxy-digest-auth-handler"></span><h2>ProxyDigestAuthHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.ProxyDigestAuthHandler.http_error_407"> +<code>ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs)</code> </dt> <dd> +<p>Retry the request with authentication information, if available.</p> </dd> +</dl> </section> <section id="httphandler-objects"> <span id="http-handler-objects"></span><h2>HTTPHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPHandler.http_open"> +<code>HTTPHandler.http_open(req)</code> </dt> <dd> +<p>Send an HTTP request, which can be either GET or POST, depending on <code>req.has_data()</code>.</p> </dd> +</dl> </section> <section id="httpshandler-objects"> <span id="https-handler-objects"></span><h2>HTTPSHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPSHandler.https_open"> +<code>HTTPSHandler.https_open(req)</code> </dt> <dd> +<p>Send an HTTPS request, which can be either GET or POST, depending on <code>req.has_data()</code>.</p> </dd> +</dl> </section> <section id="filehandler-objects"> <span id="file-handler-objects"></span><h2>FileHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.FileHandler.file_open"> +<code>FileHandler.file_open(req)</code> </dt> <dd> +<p>Open the file locally, if there is no host name, or the host name is <code>'localhost'</code>.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span>This method is applicable only for local hostnames. When a remote hostname is given, an <a class="reference internal" href="urllib.error#urllib.error.URLError" title="urllib.error.URLError"><code>URLError</code></a> is raised.</p> </div> </dd> +</dl> </section> <section id="datahandler-objects"> <span id="data-handler-objects"></span><h2>DataHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.DataHandler.data_open"> +<code>DataHandler.data_open(req)</code> </dt> <dd> +<p>Read a data URL. This kind of URL contains the content encoded in the URL itself. The data URL syntax is specified in <span class="target" id="index-8"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2397.html"><strong>RFC 2397</strong></a>. This implementation ignores white spaces in base64 encoded data URLs so the URL may be wrapped in whatever source file it comes from. But even though some browsers don’t mind about a missing padding at the end of a base64 encoded data URL, this implementation will raise an <a class="reference internal" href="exceptions#ValueError" title="ValueError"><code>ValueError</code></a> in that case.</p> </dd> +</dl> </section> <section id="ftphandler-objects"> <span id="ftp-handler-objects"></span><h2>FTPHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.FTPHandler.ftp_open"> +<code>FTPHandler.ftp_open(req)</code> </dt> <dd> +<p>Open the FTP file indicated by <em>req</em>. The login is always done with empty username and password.</p> </dd> +</dl> </section> <section id="cacheftphandler-objects"> <span id="cacheftp-handler-objects"></span><h2>CacheFTPHandler Objects</h2> <p><a class="reference internal" href="#urllib.request.CacheFTPHandler" title="urllib.request.CacheFTPHandler"><code>CacheFTPHandler</code></a> objects are <a class="reference internal" href="#urllib.request.FTPHandler" title="urllib.request.FTPHandler"><code>FTPHandler</code></a> objects with the following additional methods:</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.CacheFTPHandler.setTimeout"> +<code>CacheFTPHandler.setTimeout(t)</code> </dt> <dd> +<p>Set timeout of connections to <em>t</em> seconds.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.CacheFTPHandler.setMaxConns"> +<code>CacheFTPHandler.setMaxConns(m)</code> </dt> <dd> +<p>Set maximum number of cached connections to <em>m</em>.</p> </dd> +</dl> </section> <section id="unknownhandler-objects"> <span id="unknown-handler-objects"></span><h2>UnknownHandler Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.UnknownHandler.unknown_open"> +<code>UnknownHandler.unknown_open()</code> </dt> <dd> +<p>Raise a <a class="reference internal" href="urllib.error#urllib.error.URLError" title="urllib.error.URLError"><code>URLError</code></a> exception.</p> </dd> +</dl> </section> <section id="httperrorprocessor-objects"> <span id="http-error-processor-objects"></span><h2>HTTPErrorProcessor Objects</h2> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPErrorProcessor.http_response"> +<code>HTTPErrorProcessor.http_response(request, response)</code> </dt> <dd> +<p>Process HTTP error responses.</p> <p>For 200 error codes, the response object is returned immediately.</p> <p>For non-200 error codes, this simply passes the job on to the <code>http_error_<type>()</code> handler methods, via <a class="reference internal" href="#urllib.request.OpenerDirector.error" title="urllib.request.OpenerDirector.error"><code>OpenerDirector.error()</code></a>. Eventually, <a class="reference internal" href="#urllib.request.HTTPDefaultErrorHandler" title="urllib.request.HTTPDefaultErrorHandler"><code>HTTPDefaultErrorHandler</code></a> will raise an <a class="reference internal" href="urllib.error#urllib.error.HTTPError" title="urllib.error.HTTPError"><code>HTTPError</code></a> if no other handler handles the error.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.HTTPErrorProcessor.https_response"> +<code>HTTPErrorProcessor.https_response(request, response)</code> </dt> <dd> +<p>Process HTTPS error responses.</p> <p>The behavior is same as <a class="reference internal" href="#urllib.request.HTTPErrorProcessor.http_response" title="urllib.request.HTTPErrorProcessor.http_response"><code>http_response()</code></a>.</p> </dd> +</dl> </section> <section id="examples"> <span id="urllib-request-examples"></span><h2>Examples</h2> <p>In addition to the examples below, more examples are given in <a class="reference internal" href="../howto/urllib2#urllib-howto"><span class="std std-ref">HOWTO Fetch Internet Resources Using The urllib Package</span></a>.</p> <p>This example gets the python.org main page and displays the first 300 bytes of it.</p> <pre data-language="python">>>> import urllib.request +>>> with urllib.request.urlopen('http://www.python.org/') as f: +... print(f.read(300)) +... +b'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" +"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html +xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">\n\n<head>\n +<meta http-equiv="content-type" content="text/html; charset=utf-8" />\n +<title>Python Programming ' +</pre> <p>Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the HTTP server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding.</p> <p>The following W3C document, <a class="reference external" href="https://www.w3.org/International/O-charset">https://www.w3.org/International/O-charset</a>, lists the various ways in which an (X)HTML or an XML document could have specified its encoding information.</p> <p>As the python.org website uses <em>utf-8</em> encoding as specified in its meta tag, we will use the same for decoding the bytes object.</p> <pre data-language="python">>>> with urllib.request.urlopen('http://www.python.org/') as f: +... print(f.read(100).decode('utf-8')) +... +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" +"http://www.w3.org/TR/xhtml1/DTD/xhtm +</pre> <p>It is also possible to achieve the same result without using the <a class="reference internal" href="../glossary#term-context-manager"><span class="xref std std-term">context manager</span></a> approach.</p> <pre data-language="python">>>> import urllib.request +>>> f = urllib.request.urlopen('http://www.python.org/') +>>> print(f.read(100).decode('utf-8')) +<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" +"http://www.w3.org/TR/xhtml1/DTD/xhtm +</pre> <p>In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work when the Python installation supports SSL.</p> <pre data-language="python">>>> import urllib.request +>>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', +... data=b'This data is passed to stdin of the CGI') +>>> with urllib.request.urlopen(req) as f: +... print(f.read().decode('utf-8')) +... +Got Data: "This data is passed to stdin of the CGI" +</pre> <p>The code for the sample CGI used in the above example is:</p> <pre data-language="python">#!/usr/bin/env python +import sys +data = sys.stdin.read() +print('Content-type: text/plain\n\nGot Data: "%s"' % data) +</pre> <p>Here is an example of doing a <code>PUT</code> request using <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a>:</p> <pre data-language="python">import urllib.request +DATA = b'some data' +req = urllib.request.Request(url='http://localhost:8080', data=DATA, method='PUT') +with urllib.request.urlopen(req) as f: + pass +print(f.status) +print(f.reason) +</pre> <p>Use of Basic HTTP Authentication:</p> <pre data-language="python">import urllib.request +# Create an OpenerDirector with support for Basic HTTP Authentication... +auth_handler = urllib.request.HTTPBasicAuthHandler() +auth_handler.add_password(realm='PDQ Application', + uri='https://mahler:8092/site-updates.py', + user='klem', + passwd='kadidd!ehopper') +opener = urllib.request.build_opener(auth_handler) +# ...and install it globally so it can be used with urlopen. +urllib.request.install_opener(opener) +urllib.request.urlopen('http://www.example.com/login.html') +</pre> <p><a class="reference internal" href="#urllib.request.build_opener" title="urllib.request.build_opener"><code>build_opener()</code></a> provides many handlers by default, including a <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a>. By default, <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> uses the environment variables named <code><scheme>_proxy</code>, where <code><scheme></code> is the URL scheme involved. For example, the <span class="target" id="index-9"></span><code>http_proxy</code> environment variable is read to obtain the HTTP proxy’s URL.</p> <p>This example replaces the default <a class="reference internal" href="#urllib.request.ProxyHandler" title="urllib.request.ProxyHandler"><code>ProxyHandler</code></a> with one that uses programmatically supplied proxy URLs, and adds proxy authorization support with <a class="reference internal" href="#urllib.request.ProxyBasicAuthHandler" title="urllib.request.ProxyBasicAuthHandler"><code>ProxyBasicAuthHandler</code></a>.</p> <pre data-language="python">proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) +proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() +proxy_auth_handler.add_password('realm', 'host', 'username', 'password') + +opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) +# This time, rather than install the OpenerDirector, we use it directly: +opener.open('http://www.example.com/login.html') +</pre> <p>Adding HTTP headers:</p> <p>Use the <em>headers</em> argument to the <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> constructor, or:</p> <pre data-language="python">import urllib.request +req = urllib.request.Request('http://www.example.com/') +req.add_header('Referer', 'http://www.python.org/') +# Customize the default User-Agent header value: +req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') +r = urllib.request.urlopen(req) +</pre> <p><a class="reference internal" href="#urllib.request.OpenerDirector" title="urllib.request.OpenerDirector"><code>OpenerDirector</code></a> automatically adds a <em class="mailheader">User-Agent</em> header to every <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a>. To change this:</p> <pre data-language="python">import urllib.request +opener = urllib.request.build_opener() +opener.addheaders = [('User-agent', 'Mozilla/5.0')] +opener.open('http://www.example.com/') +</pre> <p>Also, remember that a few standard headers (<em class="mailheader">Content-Length</em>, <em class="mailheader">Content-Type</em> and <em class="mailheader">Host</em>) are added when the <a class="reference internal" href="#urllib.request.Request" title="urllib.request.Request"><code>Request</code></a> is passed to <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a> (or <a class="reference internal" href="#urllib.request.OpenerDirector.open" title="urllib.request.OpenerDirector.open"><code>OpenerDirector.open()</code></a>).</p> <p id="urllib-examples">Here is an example session that uses the <code>GET</code> method to retrieve a URL containing parameters:</p> <pre data-language="python">>>> import urllib.request +>>> import urllib.parse +>>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) +>>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params +>>> with urllib.request.urlopen(url) as f: +... print(f.read().decode('utf-8')) +... +</pre> <p>The following example uses the <code>POST</code> method instead. Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data:</p> <pre data-language="python">>>> import urllib.request +>>> import urllib.parse +>>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) +>>> data = data.encode('ascii') +>>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: +... print(f.read().decode('utf-8')) +... +</pre> <p>The following example uses an explicitly specified HTTP proxy, overriding environment settings:</p> <pre data-language="python">>>> import urllib.request +>>> proxies = {'http': 'http://proxy.example.com:8080/'} +>>> opener = urllib.request.FancyURLopener(proxies) +>>> with opener.open("http://www.python.org") as f: +... f.read().decode('utf-8') +... +</pre> <p>The following example uses no proxies at all, overriding environment settings:</p> <pre data-language="python">>>> import urllib.request +>>> opener = urllib.request.FancyURLopener({}) +>>> with opener.open("http://www.python.org/") as f: +... f.read().decode('utf-8') +... +</pre> </section> <section id="legacy-interface"> <h2>Legacy interface</h2> <p>The following functions and classes are ported from the Python 2 module <code>urllib</code> (as opposed to <code>urllib2</code>). They might become deprecated at some point in the future.</p> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.urlretrieve"> +<code>urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)</code> </dt> <dd> +<p>Copy a network object denoted by a URL to a local file. If the URL points to a local file, the object will not be copied unless filename is supplied. Return a tuple <code>(filename, headers)</code> where <em>filename</em> is the local file name under which the object can be found, and <em>headers</em> is whatever the <code>info()</code> method of the object returned by <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a> returned (for a remote object). Exceptions are the same as for <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>.</p> <p>The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name). The third argument, if present, is a callable that will be called once on establishment of the network connection and once after each block read thereafter. The callable will be passed three arguments; a count of blocks transferred so far, a block size in bytes, and the total size of the file. The third argument may be <code>-1</code> on older FTP servers which do not return a file size in response to a retrieval request.</p> <p>The following example illustrates the most common usage scenario:</p> <pre data-language="python">>>> import urllib.request +>>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') +>>> html = open(local_filename) +>>> html.close() +</pre> <p>If the <em>url</em> uses the <code>http:</code> scheme identifier, the optional <em>data</em> argument may be given to specify a <code>POST</code> request (normally the request type is <code>GET</code>). The <em>data</em> argument must be a bytes object in standard <em class="mimetype">application/x-www-form-urlencoded</em> format; see the <a class="reference internal" href="urllib.parse#urllib.parse.urlencode" title="urllib.parse.urlencode"><code>urllib.parse.urlencode()</code></a> function.</p> <p><a class="reference internal" href="#urllib.request.urlretrieve" title="urllib.request.urlretrieve"><code>urlretrieve()</code></a> will raise <code>ContentTooShortError</code> when it detects that the amount of data available was less than the expected amount (which is the size reported by a <em>Content-Length</em> header). This can occur, for example, when the download is interrupted.</p> <p>The <em>Content-Length</em> is treated as a lower bound: if there’s more data to read, urlretrieve reads more data, but if less data is available, it raises the exception.</p> <p>You can still retrieve the downloaded data in this case, it is stored in the <code>content</code> attribute of the exception instance.</p> <p>If no <em>Content-Length</em> header was supplied, urlretrieve can not check the size of the data it has downloaded, and just returns it. In this case you just have to assume that the download was successful.</p> </dd> +</dl> <dl class="py function"> <dt class="sig sig-object py" id="urllib.request.urlcleanup"> +<code>urllib.request.urlcleanup()</code> </dt> <dd> +<p>Cleans up temporary files that may have been left behind by previous calls to <a class="reference internal" href="#urllib.request.urlretrieve" title="urllib.request.urlretrieve"><code>urlretrieve()</code></a>.</p> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.URLopener"> +<code>class urllib.request.URLopener(proxies=None, **x509)</code> </dt> <dd> +<div class="deprecated"> <p><span class="versionmodified deprecated">Deprecated since version 3.3.</span></p> </div> <p>Base class for opening and reading URLs. Unless you need to support opening objects using schemes other than <code>http:</code>, <code>ftp:</code>, or <code>file:</code>, you probably want to use <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a>.</p> <p>By default, the <a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a> class sends a <em class="mailheader">User-Agent</em> header of <code>urllib/VVV</code>, where <em>VVV</em> is the <a class="reference internal" href="urllib#module-urllib" title="urllib"><code>urllib</code></a> version number. Applications can define their own <em class="mailheader">User-Agent</em> header by subclassing <a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a> or <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a> and setting the class attribute <a class="reference internal" href="#urllib.request.URLopener.version" title="urllib.request.URLopener.version"><code>version</code></a> to an appropriate string value in the subclass definition.</p> <p>The optional <em>proxies</em> parameter should be a dictionary mapping scheme names to proxy URLs, where an empty dictionary turns proxies off completely. Its default value is <code>None</code>, in which case environmental proxy settings will be used if present, as discussed in the definition of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>, above.</p> <p>Additional keyword parameters, collected in <em>x509</em>, may be used for authentication of the client when using the <code>https:</code> scheme. The keywords <em>key_file</em> and <em>cert_file</em> are supported to provide an SSL key and certificate; both are needed to support client authentication.</p> <p><a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a> objects will raise an <a class="reference internal" href="exceptions#OSError" title="OSError"><code>OSError</code></a> exception if the server returns an error code.</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.URLopener.open"> +<code>open(fullurl, data=None)</code> </dt> <dd> +<p>Open <em>fullurl</em> using the appropriate protocol. This method sets up cache and proxy information, then calls the appropriate open method with its input arguments. If the scheme is not recognized, <a class="reference internal" href="#urllib.request.URLopener.open_unknown" title="urllib.request.URLopener.open_unknown"><code>open_unknown()</code></a> is called. The <em>data</em> argument has the same meaning as the <em>data</em> argument of <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a>.</p> <p>This method always quotes <em>fullurl</em> using <a class="reference internal" href="urllib.parse#urllib.parse.quote" title="urllib.parse.quote"><code>quote()</code></a>.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.URLopener.open_unknown"> +<code>open_unknown(fullurl, data=None)</code> </dt> <dd> +<p>Overridable interface to open unknown URL types.</p> </dd> +</dl> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.URLopener.retrieve"> +<code>retrieve(url, filename=None, reporthook=None, data=None)</code> </dt> <dd> +<p>Retrieves the contents of <em>url</em> and places it in <em>filename</em>. The return value is a tuple consisting of a local filename and either an <a class="reference internal" href="email.compat32-message#email.message.Message" title="email.message.Message"><code>email.message.Message</code></a> object containing the response headers (for remote URLs) or <code>None</code> (for local URLs). The caller must then open and read the contents of <em>filename</em>. If <em>filename</em> is not given and the URL refers to a local file, the input filename is returned. If the URL is non-local and <em>filename</em> is not given, the filename is the output of <a class="reference internal" href="tempfile#tempfile.mktemp" title="tempfile.mktemp"><code>tempfile.mktemp()</code></a> with a suffix that matches the suffix of the last path component of the input URL. If <em>reporthook</em> is given, it must be a function accepting three numeric parameters: A chunk number, the maximum size chunks are read in and the total size of the download (-1 if unknown). It will be called once at the start and after each chunk of data is read from the network. <em>reporthook</em> is ignored for local URLs.</p> <p>If the <em>url</em> uses the <code>http:</code> scheme identifier, the optional <em>data</em> argument may be given to specify a <code>POST</code> request (normally the request type is <code>GET</code>). The <em>data</em> argument must in standard <em class="mimetype">application/x-www-form-urlencoded</em> format; see the <a class="reference internal" href="urllib.parse#urllib.parse.urlencode" title="urllib.parse.urlencode"><code>urllib.parse.urlencode()</code></a> function.</p> </dd> +</dl> <dl class="py attribute"> <dt class="sig sig-object py" id="urllib.request.URLopener.version"> +<code>version</code> </dt> <dd> +<p>Variable that specifies the user agent of the opener object. To get <a class="reference internal" href="urllib#module-urllib" title="urllib"><code>urllib</code></a> to tell servers that it is a particular user agent, set this in a subclass as a class variable or in the constructor before calling the base constructor.</p> </dd> +</dl> </dd> +</dl> <dl class="py class"> <dt class="sig sig-object py" id="urllib.request.FancyURLopener"> +<code>class urllib.request.FancyURLopener(...)</code> </dt> <dd> +<div class="deprecated"> <p><span class="versionmodified deprecated">Deprecated since version 3.3.</span></p> </div> <p><a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a> subclasses <a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a> providing default handling for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x response codes listed above, the <em class="mailheader">Location</em> header is used to fetch the actual URL. For 401 response codes (authentication required), basic HTTP authentication is performed. For the 30x response codes, recursion is bounded by the value of the <em>maxtries</em> attribute, which defaults to 10.</p> <p>For all other response codes, the method <code>http_error_default()</code> is called which you can override in subclasses to handle the error appropriately.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>According to the letter of <span class="target" id="index-10"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2616.html"><strong>RFC 2616</strong></a>, 301 and 302 responses to POST requests must not be automatically redirected without confirmation by the user. In reality, browsers do allow automatic redirection of these responses, changing the POST to a GET, and <a class="reference internal" href="urllib#module-urllib" title="urllib"><code>urllib</code></a> reproduces this behaviour.</p> </div> <p>The parameters to the constructor are the same as those for <a class="reference internal" href="#urllib.request.URLopener" title="urllib.request.URLopener"><code>URLopener</code></a>.</p> <div class="admonition note"> <p class="admonition-title">Note</p> <p>When performing basic authentication, a <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a> instance calls its <a class="reference internal" href="#urllib.request.FancyURLopener.prompt_user_passwd" title="urllib.request.FancyURLopener.prompt_user_passwd"><code>prompt_user_passwd()</code></a> method. The default implementation asks the users for the required information on the controlling terminal. A subclass may override this method to support more appropriate behavior if needed.</p> </div> <p>The <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a> class offers one additional method that should be overloaded to provide the appropriate behavior:</p> <dl class="py method"> <dt class="sig sig-object py" id="urllib.request.FancyURLopener.prompt_user_passwd"> +<code>prompt_user_passwd(host, realm)</code> </dt> <dd> +<p>Return information needed to authenticate the user at the given host in the specified security realm. The return value should be a tuple, <code>(user, +password)</code>, which can be used for basic authentication.</p> <p>The implementation prompts for this information on the terminal; an application should override this method to use an appropriate interaction model in the local environment.</p> </dd> +</dl> </dd> +</dl> </section> <section id="urllib-request-restrictions"> <h2>urllib.request Restrictions</h2> <ul id="index-11"> <li> +<p>Currently, only the following protocols are supported: HTTP (versions 0.9 and 1.0), FTP, local files, and data URLs.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.4: </span>Added support for data URLs.</p> </div> </li> <li>The caching feature of <a class="reference internal" href="#urllib.request.urlretrieve" title="urllib.request.urlretrieve"><code>urlretrieve()</code></a> has been disabled until someone finds the time to hack proper processing of Expiration time headers.</li> <li>There should be a function to query whether a particular URL is in the cache.</li> <li>For backward compatibility, if a URL appears to point to a local file but the file can’t be opened, the URL is re-interpreted using the FTP protocol. This can sometimes cause confusing error messages.</li> <li>The <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a> and <a class="reference internal" href="#urllib.request.urlretrieve" title="urllib.request.urlretrieve"><code>urlretrieve()</code></a> functions can cause arbitrarily long delays while waiting for a network connection to be set up. This means that it is difficult to build an interactive web client using these functions without using threads. </li> <li id="index-12">The data returned by <a class="reference internal" href="#urllib.request.urlopen" title="urllib.request.urlopen"><code>urlopen()</code></a> or <a class="reference internal" href="#urllib.request.urlretrieve" title="urllib.request.urlretrieve"><code>urlretrieve()</code></a> is the raw data returned by the server. This may be binary data (such as an image), plain text or (for example) HTML. The HTTP protocol provides type information in the reply header, which can be inspected by looking at the <em class="mailheader">Content-Type</em> header. If the returned data is HTML, you can use the module <a class="reference internal" href="html.parser#module-html.parser" title="html.parser: A simple parser that can handle HTML and XHTML."><code>html.parser</code></a> to parse it. </li> <li id="index-13">The code handling the FTP protocol cannot differentiate between a file and a directory. This can lead to unexpected behavior when attempting to read a URL that points to a file that is not accessible. If the URL ends in a <code>/</code>, it is assumed to refer to a directory and will be handled accordingly. But if an attempt to read a file leads to a 550 error (meaning the URL cannot be found or is not accessible, often for permission reasons), then the path is treated as a directory in order to handle the case when a directory is specified by a URL but the trailing <code>/</code> has been left off. This can cause misleading results when you try to fetch a file whose read permissions make it inaccessible; the FTP code will try to read it, fail with a 550 error, and then perform a directory listing for the unreadable file. If fine-grained control is needed, consider using the <a class="reference internal" href="ftplib#module-ftplib" title="ftplib: FTP protocol client (requires sockets)."><code>ftplib</code></a> module, subclassing <a class="reference internal" href="#urllib.request.FancyURLopener" title="urllib.request.FancyURLopener"><code>FancyURLopener</code></a>, or changing <em>_urlopener</em> to meet your needs.</li> </ul> </section> <div class="_attribution"> + <p class="_attribution-p"> + © 2001–2023 Python Software Foundation<br>Licensed under the PSF License.<br> + <a href="https://docs.python.org/3.12/library/urllib.request.html" class="_attribution-link">https://docs.python.org/3.12/library/urllib.request.html</a> + </p> +</div> |
