diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/c/language%2Fcharacter_constant.html | |
new repository
Diffstat (limited to 'devdocs/c/language%2Fcharacter_constant.html')
| -rw-r--r-- | devdocs/c/language%2Fcharacter_constant.html | 106 |
1 files changed, 106 insertions, 0 deletions
diff --git a/devdocs/c/language%2Fcharacter_constant.html b/devdocs/c/language%2Fcharacter_constant.html new file mode 100644 index 00000000..acf4a443 --- /dev/null +++ b/devdocs/c/language%2Fcharacter_constant.html @@ -0,0 +1,106 @@ + <h1 id="firstHeading" class="firstHeading">Character constant</h1> <h3 id="Syntax"> Syntax</h3> <table class="t-sdsc-begin"> <tr class="t-sdsc"> <td> <code>'</code><span class="t-spar">c-char</span> <code>'</code> </td> <td> (1) </td> <td class="t-sdsc-nopad"> </td> +</tr> <tr class="t-sdsc"> <td> <code>u8'</code><span class="t-spar">c-char</span> <code>'</code> </td> <td> (2) </td> <td> <span class="t-mark-rev t-since-c23">(since C23)</span> </td> +</tr> <tr class="t-sdsc"> <td> <code>u'</code><span class="t-spar">c-char</span> <code>'</code> </td> <td> (3) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td> +</tr> <tr class="t-sdsc"> <td> <code>U'</code><span class="t-spar">c-char</span> <code>'</code> </td> <td> (4) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td> +</tr> <tr class="t-sdsc"> <td> <code>L'</code><span class="t-spar">c-char</span> <code>'</code> </td> <td> (5) </td> <td class="t-sdsc-nopad"> </td> +</tr> <tr class="t-sdsc"> <td> <code>'</code><span class="t-spar">c-char-sequence</span> <code>'</code> </td> <td> (6) </td> <td class="t-sdsc-nopad"> </td> +</tr> <tr class="t-sdsc"> <td> <code>L'</code><span class="t-spar">c-char-sequence</span> <code>'</code> </td> <td> (7) </td> <td class="t-sdsc-nopad"> </td> +</tr> <tr class="t-sdsc"> <td> <code>u'</code><span class="t-spar">c-char-sequence</span> <code>'</code> </td> <td> (8) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span><span class="t-mark-rev t-until-c23">(removed in C23)</span> </td> +</tr> <tr class="t-sdsc"> <td> <code>U'</code><span class="t-spar">c-char-sequence</span> <code>'</code> </td> <td> (9) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span><span class="t-mark-rev t-until-c23">(removed in C23)</span> </td> +</tr> +</table> <p>where</p> +<ul> +<li> <span class="t-spar">c-char</span> is either </li> +<ul> +<li> a character from the basic source character set minus single-quote (<code>'</code>), backslash (<code>\</code>), or the newline character. </li> +<li> escape sequence: one of special character escapes <code>\'</code> <code>\"</code> <code>\?</code> <code>\\</code> <code>\a</code> <code>\b</code> <code>\f</code> <code>\n</code> <code>\r</code> <code>\t</code> <code>\v</code>, hex escapes <code>\x...</code> or octal escapes <code>\...</code> as defined in <a href="escape" title="c/language/escape">escape sequences</a>. </li> +</ul> +</ul> <table class="t-rev-begin"> <tr class="t-rev t-since-c99"> +<td> <ul><li>universal character name, <code>\u...</code> or <code>\U...</code> as defined in <a href="escape" title="c/language/escape">escape sequences</a>. </li></ul> </td> <td><span class="t-mark-rev t-since-c99">(since C99)</span></td> +</tr> </table> <ul><li> <span class="t-spar">c-char-sequence</span> is a sequence of two or more <span class="t-spar">c-char</span>s. </li></ul> <div class="t-li1"> +<span class="t-li">1)</span> single-byte integer character constant, e.g. <code>'a'</code> or <code>'\n'</code> or <code>'\13'</code>. Such constant has type <code>int</code> and a value equal to the representation of <span class="t-spar">c-char</span> in the execution character set as a value of type <code>char</code> mapped to <code>int</code>. If <span class="t-spar">c-char</span> is not representable as a single byte in the execution character set, the value is implementation-defined.</div> <div class="t-li1"> +<span class="t-li">2)</span> UTF-8 character constant, e.g. <code>u8'a'</code>. Such constant has type <code>char8_t</code> and the value equal to ISO 10646 code point value of <span class="t-spar">c-char</span>, provided that the code point value is representable with a single UTF-8 code unit (that is, <span class="t-spar">c-char</span> is in the range 0x0-0x7F, inclusive). If <span class="t-spar">c-char</span> is not representable with a single UTF-8 code unit, the program is ill-formed.</div> <table class="t-rev-begin"> <tr class="t-rev t-until-c23"> +<td> <span class="t-li">3)</span> 16-bit wide character constant, e.g. <code>u'่ฒ'</code>, but not <code>u'๐'</code> (<code>u'\U0001f34c'</code>). Such constant has type <code>char16_t</code> and a value equal to the value of <span class="t-spar">c-char</span> in the 16-bit encoding produced by <code><a href="../string/multibyte/mbrtoc16" title="c/string/multibyte/mbrtoc16">mbrtoc16</a></code> (normally UTF-16). If <span class="t-spar">c-char</span> is not representable or maps to more than one 16-bit character, the value is implementation-defined. <span class="t-li">4)</span> 32-bit wide character constant, e.g. <code>U'่ฒ'</code> or <code>U'๐'</code>. Such constant has type <code>char32_t</code> and a value equal to the value of <span class="t-spar">c-char</span> in in the 32-bit encoding produced by <code><a href="../string/multibyte/mbrtoc32" title="c/string/multibyte/mbrtoc32">mbrtoc32</a></code> (normally UTF-32). If <span class="t-spar">c-char</span> is not representable or maps to more than one 32-bit character, the value is implementation-defined. </td> <td><span class="t-mark-rev t-until-c23">(until C23)</span></td> +</tr> <tr class="t-rev t-since-c23"> +<td> <span class="t-li">3)</span> UTF-16 character constant, e.g. <code>u'่ฒ'</code>, but not <code>u'๐'</code> (<code>u'\U0001f34c'</code>). Such constant has type <code>char16_t</code> and the value equal to ISO 10646 code point value of <span class="t-spar">c-char</span>, provided that the code point value is representable with a single UTF-16 code unit (that is, <span class="t-spar">c-char</span> is in the range 0x0-0xD7FF or 0xE000-0xFFFF, inclusive). If <span class="t-spar">c-char</span> is not representable with a single UTF-16 code unit, the program is ill-formed. <span class="t-li">4)</span> UTF-32 character constant, e.g. <code>U'่ฒ'</code> or <code>U'๐'</code>. Such constant has type <code>char32_t</code> and the value equal to ISO 10646 code point value of <span class="t-spar">c-char</span>, provided that the code point value is representable with a single UTF-32 code unit (that is, <span class="t-spar">c-char</span> is in the range 0x0-0xD7FF or 0xE000-0x10FFFF, inclusive). If <span class="t-spar">c-char</span> is not representable with a single UTF-32 code unit, the program is ill-formed. </td> <td><span class="t-mark-rev t-since-c23">(since C23)</span></td> +</tr> </table> <div class="t-li1"> +<span class="t-li">5)</span> wide character constant, e.g. <code>L'ฮฒ'</code> or <code>L'่ฒ</code>. Such constant has type <code>wchar_t</code> and a value equal to the value of <span class="t-spar">c-char</span> in the execution wide character set (that is, the value that would be produced by <code><a href="../string/multibyte/mbtowc" title="c/string/multibyte/mbtowc">mbtowc</a></code>). If <span class="t-spar">c-char</span> is not representable or maps to more than one wide character (e.g. a non-BMP value on Windows where <code>wchar_t</code> is 16-bit), the value is implementation-defined .</div> <div class="t-li1"> +<span class="t-li">6)</span> multicharacter constant, e.g. <code>'AB'</code>, has type <code>int</code> and implementation-defined value.</div> <div class="t-li1"> +<span class="t-li">7)</span> wide multicharacter constant, e.g. <code>L'AB'</code>, has type <code>wchar_t</code> and implementation-defined value.</div> <div class="t-li1"> +<span class="t-li">8)</span> 16-bit multicharacter constant, e.g. <code>u'CD'</code>, has type <code>char16_t</code> and implementation-defined value.</div> <div class="t-li1"> +<span class="t-li">9)</span> 32-bit multicharacter constant, e.g. <code>U'XY'</code>, has type <code>char32_t</code> and implementation-defined value.</div> <h3 id="Notes"> Notes</h3> <p>Multicharacter constants were inherited by C from the B programming language. Although not specified by the C standard, most compilers (MSVC is a notable exception) implement multicharacter constants as specified in B: the values of each char in the constant initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of <code>'\1'</code> is <code>0x00000001</code> and the value of <code>'\1\2\3\4'</code> is <code>0x01020304</code>.</p> +<p>In C++, encodable ordinary character literals have type <code>char</code>, rather than <code>int</code>.</p> +<p>Unlike <a href="integer_constant" title="c/language/integer constant">integer constants</a>, a character constant may have a negative value if <code>char</code> is signed: on such implementations <code>'\xFF'</code> is an <code>int</code> with the value <code>-1</code>.</p> +<p>When used in a controlling expression of <a href="../preprocessor/conditional" title="c/preprocessor/conditional"><code> #if</code></a> or <a href="../preprocessor/conditional" title="c/preprocessor/conditional"><code> #elif</code></a>, character constants may be interpreted in terms of the source character set, the execution character set, or some other implementation-defined character set.</p> +<p>16/32-bit multicharacter constants are not widely supported and removed in C23. Some common implementations (e.g. clang) do not accept them at all.</p> +<h3 id="Example"> Example</h3> <div class="t-example"> <div class="c source-c"><pre data-language="c">#include <stddef.h> +#include <stdio.h> +#include <uchar.h> + +int main (void) +{ + printf("constant value \n"); + printf("-------- ----------\n"); + + // integer character constants, + int c1='a'; printf("'a':\t %#010x\n", c1); + int c2='๐'; printf("'๐':\t %#010x\n\n", c2); // implementation-defined + + // multicharacter constant + int c3='ab'; printf("'ab':\t %#010x\n\n", c3); // implementation-defined + + // 16-bit wide character constants + char16_t uc1 = u'a'; printf("'a':\t %#010x\n", (int)uc1); + char16_t uc2 = u'ยข'; printf("'ยข':\t %#010x\n", (int)uc2); + char16_t uc3 = u'็ซ'; printf("'็ซ':\t %#010x\n", (int)uc3); + // implementation-defined (๐ maps to two 16-bit characters) + char16_t uc4 = u'๐'; printf("'๐':\t %#010x\n\n", (int)uc4); + + // 32-bit wide character constants + char32_t Uc1 = U'a'; printf("'a':\t %#010x\n", (int)Uc1); + char32_t Uc2 = U'ยข'; printf("'ยข':\t %#010x\n", (int)Uc2); + char32_t Uc3 = U'็ซ'; printf("'็ซ':\t %#010x\n", (int)Uc3); + char32_t Uc4 = U'๐'; printf("'๐':\t %#010x\n\n", (int)Uc4); + + // wide character constants + wchar_t wc1 = L'a'; printf("'a':\t %#010x\n", (int)wc1); + wchar_t wc2 = L'ยข'; printf("'ยข':\t %#010x\n", (int)wc2); + wchar_t wc3 = L'็ซ'; printf("'็ซ':\t %#010x\n", (int)wc3); + wchar_t wc4 = L'๐'; printf("'๐':\t %#010x\n\n", (int)wc4); +}</pre></div> <p>Possible output:</p> +<div class="text source-text"><pre data-language="c">constant value +-------- ---------- +'a': 0x00000061 +'๐': 0xf09f8d8c + +'ab': 0x00006162 + +'a': 0x00000061 +'ยข': 0x000000a2 +'็ซ': 0x0000732b +'๐': 0x0000df4c + +'a': 0x00000061 +'ยข': 0x000000a2 +'็ซ': 0x0000732b +'๐': 0x0001f34c + +'a': 0x00000061 +'ยข': 0x000000a2 +'็ซ': 0x0000732b +'๐': 0x0001f34c</pre></div> </div> <h3 id="References"> References</h3> <ul> +<li> C17 standard (ISO/IEC 9899:2018): </li> +<ul><li> 6.4.4.4 Character constants (p: 48-50) </li></ul> +<li> C11 standard (ISO/IEC 9899:2011): </li> +<ul><li> 6.4.4.4 Character constants (p: 67-70) </li></ul> +<li> C99 standard (ISO/IEC 9899:1999): </li> +<ul><li> 6.4.4.4 Character constants (p: 59-61) </li></ul> +<li> C89/C90 standard (ISO/IEC 9899:1990): </li> +<ul><li> 3.1.3.4 Character constants </li></ul> +</ul> <h3 id="See_also"> See also</h3> <table class="t-dsc-begin"> <tr class="t-dsc"> <td colspan="2"> <span><a href="https://en.cppreference.com/w/cpp/language/character_literal" title="cpp/language/character literal">C++ documentation</a></span> for <span class=""><span>Character literal</span></span> </td> +</tr> </table> <div class="_attribution"> + <p class="_attribution-p"> + © cppreference.com<br>Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.<br> + <a href="https://en.cppreference.com/w/c/language/character_constant" class="_attribution-link">https://en.cppreference.com/w/c/language/character_constant</a> + </p> +</div> |
