summaryrefslogtreecommitdiff
path: root/devdocs/python~3.12/library%2Femail.header.html
blob: 1faadc596adc682335ff62ec5154177e62a4f209 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
 <span id="email-header-internationalized-headers"></span><h1>email.header: Internationalized headers</h1> <p><strong>Source code:</strong> <a class="reference external" href="https://github.com/python/cpython/tree/3.12/Lib/email/header.py">Lib/email/header.py</a></p>  <p>This module is part of the legacy (<code>Compat32</code>) email API. In the current API encoding and decoding of headers is handled transparently by the dictionary-like API of the <a class="reference internal" href="email.message#email.message.EmailMessage" title="email.message.EmailMessage"><code>EmailMessage</code></a> class. In addition to uses in legacy code, this module can be useful in applications that need to completely control the character sets used when encoding headers.</p> <p>The remaining text in this section is the original documentation of the module.</p> <p><span class="target" id="index-0"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a> is the base standard that describes the format of email messages. It derives from the older <span class="target" id="index-1"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc822.html"><strong>RFC 822</strong></a> standard which came into widespread use at a time when most email was composed of ASCII characters only. <span class="target" id="index-2"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a> is a specification written assuming email contains only 7-bit ASCII characters.</p> <p>Of course, as email has been deployed worldwide, it has become internationalized, such that language specific character sets can now be used in email messages. The base standard still requires email messages to be transferred using only 7-bit ASCII characters, so a slew of RFCs have been written describing how to encode email containing non-ASCII characters into <span class="target" id="index-3"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a>-compliant format. These RFCs include <span class="target" id="index-4"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2045.html"><strong>RFC 2045</strong></a>, <span class="target" id="index-5"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2046.html"><strong>RFC 2046</strong></a>, <span class="target" id="index-6"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2047.html"><strong>RFC 2047</strong></a>, and <span class="target" id="index-7"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2231.html"><strong>RFC 2231</strong></a>. The <a class="reference internal" href="email#module-email" title="email: Package supporting the parsing, manipulating, and generating email messages."><code>email</code></a> package supports these standards in its <a class="reference internal" href="#module-email.header" title="email.header: Representing non-ASCII headers"><code>email.header</code></a> and <a class="reference internal" href="email.charset#module-email.charset" title="email.charset: Character Sets"><code>email.charset</code></a> modules.</p> <p>If you want to include non-ASCII characters in your email headers, say in the <em class="mailheader">Subject</em> or <em class="mailheader">To</em> fields, you should use the <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> class and assign the field in the <a class="reference internal" href="email.compat32-message#email.message.Message" title="email.message.Message"><code>Message</code></a> object to an instance of <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instead of using a string for the header value. Import the <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> class from the <a class="reference internal" href="#module-email.header" title="email.header: Representing non-ASCII headers"><code>email.header</code></a> module. For example:</p> <pre data-language="python">&gt;&gt;&gt; from email.message import Message
&gt;&gt;&gt; from email.header import Header
&gt;&gt;&gt; msg = Message()
&gt;&gt;&gt; h = Header('p\xf6stal', 'iso-8859-1')
&gt;&gt;&gt; msg['Subject'] = h
&gt;&gt;&gt; msg.as_string()
'Subject: =?iso-8859-1?q?p=F6stal?=\n\n'
</pre> <p>Notice here how we wanted the <em class="mailheader">Subject</em> field to contain a non-ASCII character? We did this by creating a <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instance and passing in the character set that the byte string was encoded in. When the subsequent <a class="reference internal" href="email.compat32-message#email.message.Message" title="email.message.Message"><code>Message</code></a> instance was flattened, the <em class="mailheader">Subject</em> field was properly <span class="target" id="index-8"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2047.html"><strong>RFC 2047</strong></a> encoded. MIME-aware mail readers would show this header using the embedded ISO-8859-1 character.</p> <p>Here is the <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> class description:</p> <dl class="py class"> <dt class="sig sig-object py" id="email.header.Header">
<code>class email.header.Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict')</code> </dt> <dd>
<p>Create a MIME-compliant header that can contain strings in different character sets.</p> <p>Optional <em>s</em> is the initial header value. If <code>None</code> (the default), the initial header value is not set. You can later append to the header with <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> method calls. <em>s</em> may be an instance of <a class="reference internal" href="stdtypes#bytes" title="bytes"><code>bytes</code></a> or <a class="reference internal" href="stdtypes#str" title="str"><code>str</code></a>, but see the <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> documentation for semantics.</p> <p>Optional <em>charset</em> serves two purposes: it has the same meaning as the <em>charset</em> argument to the <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> method. It also sets the default character set for all subsequent <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> calls that omit the <em>charset</em> argument. If <em>charset</em> is not provided in the constructor (the default), the <code>us-ascii</code> character set is used both as <em>s</em>’s initial charset and as the default for subsequent <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> calls.</p> <p>The maximum line length can be specified explicitly via <em>maxlinelen</em>. For splitting the first line to a shorter value (to account for the field header which isn’t included in <em>s</em>, e.g. <em class="mailheader">Subject</em>) pass in the name of the field in <em>header_name</em>. The default <em>maxlinelen</em> is 76, and the default value for <em>header_name</em> is <code>None</code>, meaning it is not taken into account for the first line of a long, split header.</p> <p>Optional <em>continuation_ws</em> must be <span class="target" id="index-9"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a>-compliant folding whitespace, and is usually either a space or a hard tab character. This character will be prepended to continuation lines. <em>continuation_ws</em> defaults to a single space character.</p> <p>Optional <em>errors</em> is passed straight through to the <a class="reference internal" href="#email.header.Header.append" title="email.header.Header.append"><code>append()</code></a> method.</p> <dl class="py method"> <dt class="sig sig-object py" id="email.header.Header.append">
<code>append(s, charset=None, errors='strict')</code> </dt> <dd>
<p>Append the string <em>s</em> to the MIME header.</p> <p>Optional <em>charset</em>, if given, should be a <a class="reference internal" href="email.charset#email.charset.Charset" title="email.charset.Charset"><code>Charset</code></a> instance (see <a class="reference internal" href="email.charset#module-email.charset" title="email.charset: Character Sets"><code>email.charset</code></a>) or the name of a character set, which will be converted to a <a class="reference internal" href="email.charset#email.charset.Charset" title="email.charset.Charset"><code>Charset</code></a> instance. A value of <code>None</code> (the default) means that the <em>charset</em> given in the constructor is used.</p> <p><em>s</em> may be an instance of <a class="reference internal" href="stdtypes#bytes" title="bytes"><code>bytes</code></a> or <a class="reference internal" href="stdtypes#str" title="str"><code>str</code></a>. If it is an instance of <a class="reference internal" href="stdtypes#bytes" title="bytes"><code>bytes</code></a>, then <em>charset</em> is the encoding of that byte string, and a <a class="reference internal" href="exceptions#UnicodeError" title="UnicodeError"><code>UnicodeError</code></a> will be raised if the string cannot be decoded with that character set.</p> <p>If <em>s</em> is an instance of <a class="reference internal" href="stdtypes#str" title="str"><code>str</code></a>, then <em>charset</em> is a hint specifying the character set of the characters in the string.</p> <p>In either case, when producing an <span class="target" id="index-10"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a>-compliant header using <span class="target" id="index-11"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2047.html"><strong>RFC 2047</strong></a> rules, the string will be encoded using the output codec of the charset. If the string cannot be encoded using the output codec, a UnicodeError will be raised.</p> <p>Optional <em>errors</em> is passed as the errors argument to the decode call if <em>s</em> is a byte string.</p> </dd>
</dl> <dl class="py method"> <dt class="sig sig-object py" id="email.header.Header.encode">
<code>encode(splitchars=';, \t', maxlinelen=None, linesep='\n')</code> </dt> <dd>
<p>Encode a message header into an RFC-compliant format, possibly wrapping long lines and encapsulating non-ASCII parts in base64 or quoted-printable encodings.</p> <p>Optional <em>splitchars</em> is a string containing characters which should be given extra weight by the splitting algorithm during normal header wrapping. This is in very rough support of <span class="target" id="index-12"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2822.html"><strong>RFC 2822</strong></a>'s ‘higher level syntactic breaks’: split points preceded by a splitchar are preferred during line splitting, with the characters preferred in the order in which they appear in the string. Space and tab may be included in the string to indicate whether preference should be given to one over the other as a split point when other split chars do not appear in the line being split. Splitchars does not affect <span class="target" id="index-13"></span><a class="rfc reference external" href="https://datatracker.ietf.org/doc/html/rfc2047.html"><strong>RFC 2047</strong></a> encoded lines.</p> <p><em>maxlinelen</em>, if given, overrides the instance’s value for the maximum line length.</p> <p><em>linesep</em> specifies the characters used to separate the lines of the folded header. It defaults to the most useful value for Python application code (<code>\n</code>), but <code>\r\n</code> can be specified in order to produce headers with RFC-compliant line separators.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span>Added the <em>linesep</em> argument.</p> </div> </dd>
</dl> <p>The <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> class also provides a number of methods to support standard operators and built-in functions.</p> <dl class="py method"> <dt class="sig sig-object py" id="email.header.Header.__str__">
<code>__str__()</code> </dt> <dd>
<p>Returns an approximation of the <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> as a string, using an unlimited line length. All pieces are converted to unicode using the specified encoding and joined together appropriately. Any pieces with a charset of <code>'unknown-8bit'</code> are decoded as ASCII using the <code>'replace'</code> error handler.</p> <div class="versionchanged"> <p><span class="versionmodified changed">Changed in version 3.2: </span>Added handling for the <code>'unknown-8bit'</code> charset.</p> </div> </dd>
</dl> <dl class="py method"> <dt class="sig sig-object py" id="email.header.Header.__eq__">
<code>__eq__(other)</code> </dt> <dd>
<p>This method allows you to compare two <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instances for equality.</p> </dd>
</dl> <dl class="py method"> <dt class="sig sig-object py" id="email.header.Header.__ne__">
<code>__ne__(other)</code> </dt> <dd>
<p>This method allows you to compare two <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instances for inequality.</p> </dd>
</dl> </dd>
</dl> <p>The <a class="reference internal" href="#module-email.header" title="email.header: Representing non-ASCII headers"><code>email.header</code></a> module also provides the following convenient functions.</p> <dl class="py function"> <dt class="sig sig-object py" id="email.header.decode_header">
<code>email.header.decode_header(header)</code> </dt> <dd>
<p>Decode a message header value without converting the character set. The header value is in <em>header</em>.</p> <p>This function returns a list of <code>(decoded_string, charset)</code> pairs containing each of the decoded parts of the header. <em>charset</em> is <code>None</code> for non-encoded parts of the header, otherwise a lower case string containing the name of the character set specified in the encoded string.</p> <p>Here’s an example:</p> <pre data-language="python">&gt;&gt;&gt; from email.header import decode_header
&gt;&gt;&gt; decode_header('=?iso-8859-1?q?p=F6stal?=')
[(b'p\xf6stal', 'iso-8859-1')]
</pre> </dd>
</dl> <dl class="py function"> <dt class="sig sig-object py" id="email.header.make_header">
<code>email.header.make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ')</code> </dt> <dd>
<p>Create a <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instance from a sequence of pairs as returned by <a class="reference internal" href="#email.header.decode_header" title="email.header.decode_header"><code>decode_header()</code></a>.</p> <p><a class="reference internal" href="#email.header.decode_header" title="email.header.decode_header"><code>decode_header()</code></a> takes a header value string and returns a sequence of pairs of the format <code>(decoded_string, charset)</code> where <em>charset</em> is the name of the character set.</p> <p>This function takes one of those sequence of pairs and returns a <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> instance. Optional <em>maxlinelen</em>, <em>header_name</em>, and <em>continuation_ws</em> are as in the <a class="reference internal" href="#email.header.Header" title="email.header.Header"><code>Header</code></a> constructor.</p> </dd>
</dl> <div class="_attribution">
  <p class="_attribution-p">
    &copy; 2001&ndash;2023 Python Software Foundation<br>Licensed under the PSF License.<br>
    <a href="https://docs.python.org/3.12/library/email.header.html" class="_attribution-link">https://docs.python.org/3.12/library/email.header.html</a>
  </p>
</div>