summaryrefslogtreecommitdiff
path: root/devdocs/c/language%2Fstring_literal.html
diff options
context:
space:
mode:
Diffstat (limited to 'devdocs/c/language%2Fstring_literal.html')
-rw-r--r--devdocs/c/language%2Fstring_literal.html123
1 files changed, 123 insertions, 0 deletions
diff --git a/devdocs/c/language%2Fstring_literal.html b/devdocs/c/language%2Fstring_literal.html
new file mode 100644
index 00000000..39b28812
--- /dev/null
+++ b/devdocs/c/language%2Fstring_literal.html
@@ -0,0 +1,123 @@
+ <h1 id="firstHeading" class="firstHeading">String literals</h1> <p>Constructs an unnamed object of specified character array type in-place, used when a character string needs to be embedded in source code.</p>
+<h3 id="Syntax"> Syntax</h3> <table class="t-sdsc-begin"> <tr class="t-sdsc"> <td> <code>"</code> <span class="t-spar">s-char-sequence</span> <code>"</code> </td> <td> (1) </td> <td class="t-sdsc-nopad"> </td>
+</tr> <tr class="t-sdsc"> <td> <code>u8"</code> <span class="t-spar">s-char-sequence</span> <code>"</code> </td> <td> (2) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td>
+</tr> <tr class="t-sdsc"> <td> <code>u"</code> <span class="t-spar">s-char-sequence</span> <code>"</code> </td> <td> (3) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td>
+</tr> <tr class="t-sdsc"> <td> <code>U"</code> <span class="t-spar">s-char-sequence</span> <code>"</code> </td> <td> (4) </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td>
+</tr> <tr class="t-sdsc"> <td> <code>L"</code> <span class="t-spar">s-char-sequence</span> <code>"</code> </td> <td> (5) </td> <td class="t-sdsc-nopad"> </td>
+</tr>
+</table> <p>where</p>
+<table class="t-par-begin"> <tr class="t-par"> <td> <span class="t-spar">s-char-sequence</span> </td> <td> - </td> <td> zero or more characters, each of which is either a multibyte character from the source character set (excluding (<code>"</code>), <code>\</code>, and newline), or character escape, hex escape, octal escape<span class="t-rev-inl t-since-c99"><span>, or universal character name</span><span><span class="t-mark-rev t-since-c99">(since C99)</span></span></span> as defined in <a href="escape" title="c/language/escape">escape sequences</a>. </td>
+</tr>
+</table> <div class="t-li1">
+<span class="t-li">1)</span> <i>character string literal</i>: The type of the literal is <code>char[N]</code>, where <code>N</code> is the size of the string in code units of the execution narrow encoding, including the null terminator. Each <code>char</code> element in the array is initialized from the next character in <span class="t-spar">s-char-sequence</span> using the execution character set.</div> <div class="t-li1">
+<span class="t-li">2)</span> <i>UTF-8 string literal</i>: The type of the literal is <span class="t-rev-inl t-until-c23"><span><code>char[N]</code></span><span><span class="t-mark-rev t-until-c23">(until C23)</span></span></span><span class="t-rev-inl t-since-c23"><span><code>char8_t[N]</code></span><span><span class="t-mark-rev t-since-c23">(since C23)</span></span></span>, where <code>N</code> is the size of the string in UTF-8 code units including the null terminator. Each <span class="t-rev-inl t-until-c23"><span><code>char</code></span><span><span class="t-mark-rev t-until-c23">(until C23)</span></span></span><span class="t-rev-inl t-since-c23"><span><code>char8_t</code></span><span><span class="t-mark-rev t-since-c23">(since C23)</span></span></span> element in the array is initialized from the next multibyte character in <span class="t-spar">s-char-sequence</span> using UTF-8 encoding.</div> <table class="t-rev-begin"> <tr class="t-rev t-until-c23">
+<td> <span class="t-li">3)</span> 16-bit wide string literal: The type of the literal is <code>char16_t[N]</code>, where <code>N</code> is the size of the string in code units of implementation-defined 16-bit encoding (typically UTF-16), including the null terminator. Each <code>char16_t</code> element in the array is initialized as if by executing <code><a href="../string/multibyte/mbrtoc16" title="c/string/multibyte/mbrtoc16">mbrtoc16</a></code> in implementation-defined locale. <span class="t-li">4)</span> 32-bit wide string literal: The type of the literal is <code>char32_t[N]</code>, where <code>N</code> is the size of the string in code units of implementation-defined 32-bit encoding (typically UTF-32), including the null terminator. Each <code>char32_t</code> element in the array is initialized as if by executing <code><a href="../string/multibyte/mbrtoc32" title="c/string/multibyte/mbrtoc32">mbrtoc32</a></code> in implementation-defined locale. </td> <td><span class="t-mark-rev t-until-c23">(until C23)</span></td>
+</tr> <tr class="t-rev t-since-c23">
+<td> <span class="t-li">3)</span> <i>UTF-16 string literal</i>: The type of the literal is <code>char16_t[N]</code>, where <code>N</code> is the size of the string in UTF-16 code units including the null terminator. Each <code>char16_t</code> element in the array is initialized from the next multibyte character in <span class="t-spar">s-char-sequence</span> using UTF-16 encoding. <span class="t-li">4)</span> <i>UTF-32 string literal</i>: The type of the literal is <code>char32_t[N]</code>, where <code>N</code> is the size of the string in UTF-32 code units including the null terminator. Each <code>char32_t</code> element in the array is initialized from the next multibyte character in <span class="t-spar">s-char-sequence</span> using UTF-32 encoding. </td> <td><span class="t-mark-rev t-since-c23">(since C23)</span></td>
+</tr> </table> <div class="t-li1">
+<span class="t-li">5)</span> wide string literal: The type of the literal is <code>wchar_t[N]</code>, where <code>N</code> is the size of the string in code units of the execution wide encoding, including the null terminator. Each <code>wchar_t</code> element in the array is initialized as if by executing <code><a href="../string/multibyte/mbstowcs" title="c/string/multibyte/mbstowcs">mbstowcs</a></code> in implementation-defined locale.</div> <h3 id="Explanation"> Explanation</h3> <p>First, at <a href="translation_phases" title="c/language/translation phases">translation phase 6</a> (after macro expansion), the adjacent string literals (that is, string literals separated by whitespace only) are concatenated.</p>
+<table class="t-rev-begin"> <tr class="t-rev t-until-c99">
+<td> <p>Only two narrow or two wide string literals may be concatenated.</p>
+</td> <td><span class="t-mark-rev t-until-c99">(until C99)</span></td>
+</tr> <tr class="t-rev t-since-c99">
+<td> <p>If one literal is unprefixed, the resulting string literal has the width/encoding specified by the prefixed literal.</p>
+<div class="c source-c"><pre data-language="c">L"Δx = %" PRId16 // at phase 4, PRId16 expands to "d"
+ // at phase 6, L"Δx = %" and "d" form L"Δx = %d"</pre></div> </td> <td><span class="t-mark-rev t-since-c99">(since C99)</span></td>
+</tr> </table> <table class="t-rev-begin"> <tr class="t-rev t-since-c11 t-until-c23">
+<td> <p>If the two string literals have different encoding prefixes, concatenation is implementation-defined, except that a UTF-8 string literal and a wide string literal cannot be concatenated.</p>
+</td> <td>
+<span class="t-mark-rev t-since-c11">(since C11)</span><br><span class="t-mark-rev t-until-c23">(until C23)</span>
+</td>
+</tr> <tr class="t-rev t-since-c23">
+<td> <p>If the two string literals have different encoding prefixes, concatenation is ill-formed.</p>
+</td> <td><span class="t-mark-rev t-since-c23">(since C23)</span></td>
+</tr> </table> <p>Secondly, at <a href="translation_phases" title="c/language/translation phases">translation phase 7</a>, a terminating null character is added to each string literal, and then each literal initializes an unnamed array with static <a href="storage_duration" title="c/language/storage duration">storage duration</a> and length just enough to contain the contents of the string literal plus one for the null terminator.</p>
+<div class="c source-c"><pre data-language="c">char* p = "\x12" "3"; // creates a static char[3] array holding {'\x12', '3', '\0'}
+ // sets p to point to the first element of the array</pre></div> <p>String literals are <b>not modifiable</b> (and in fact may be placed in read-only memory such as <code>.rodata</code>). If a program attempts to modify the static array formed by a string literal, the behavior is undefined.</p>
+<div class="c source-c"><pre data-language="c">char* p = "Hello";
+p[1] = 'M'; // Undefined behavior
+char a[] = "Hello";
+a[1] = 'M'; // OK: a is not a string literal</pre></div> <p>It is neither required nor forbidden for identical string literals to refer to the same location in memory. Moreover, overlapping string literals or string literals that are substrings of other string literals may be combined.</p>
+<div class="c source-c"><pre data-language="c">"def" == 3+"abcdef"; // may be 1 or 0, implementation-defined</pre></div> <h3 id="Notes"> Notes</h3> <p>A string literal is not necessarily a string; if a string literal has embedded null characters, it represents an array which contains more than one string:</p>
+<div class="c source-c"><pre data-language="c">char* p = "abc\0def"; // strlen(p) == 3, but the array has size 8</pre></div> <p>If a valid hex digit follows a hex escape in a string literal, it would fail to compile as an invalid escape sequence, but string concatenation can be used as a workaround:</p>
+<div class="c source-c"><pre data-language="c">//char* p = "\xfff"; // error: hex escape sequence out of range
+char* p = "\xff""f"; // okay, the literal is char[3] holding {'\xff', 'f', '\0'}</pre></div> <p>String literals can be used to <a href="array_initialization" title="c/language/array initialization">initialize arrays</a>, and if the size of the array is one less the size of the string literal, the null terminator is ignored:</p>
+<div class="c source-c"><pre data-language="c">char a1[] = "abc"; // a1 is char[4] holding {'a', 'b', 'c', '\0'}
+char a2[4] = "abc"; // a2 is char[4] holding {'a', 'b', 'c', '\0'}
+char a3[3] = "abc"; // a3 is char[3] holding {'a', 'b', 'c'}</pre></div> <p>The encoding of character string literals <span class="t-v">(1)</span> and wide string literals <span class="t-v">(5)</span> is implementation-defined. For example, gcc selects them with the <a rel="nofollow" class="external text" href="https://gcc.gnu.org/onlinedocs/cpp/Invocation.html">commandline options</a> <code>-fexec-charset</code> and <code>-fwide-exec-charset</code>.</p>
+<p>Although mixed wide string literal concatenation is allowed in C11, almost all compilers reject such concatenation (the only known exception is <a rel="nofollow" class="external text" href="http://sdcc.sourceforge.net/">SDCC</a>), and its usage experience is unknown. As a result, allowance of mixed wide string literal concatenation is removed in C23.</p>
+<h3 id="Example"> Example</h3> <div class="t-example"> <div class="c source-c"><pre data-language="c">#include &lt;inttypes.h&gt;
+#include &lt;locale.h&gt;
+#include &lt;stddef.h&gt;
+#include &lt;stdio.h&gt;
+#include &lt;stdlib.h&gt;
+#include &lt;uchar.h&gt;
+
+int main(void)
+{
+ char s1[] = "a猫🍌"; // or "a\u732B\U0001F34C"
+#if __STDC_VERSION__ &gt;= 202311L
+ char8_t
+#else
+ char
+#endif
+ s2[] = u8"a猫🍌";
+ char16_t s3[] = u"a猫🍌";
+ char32_t s4[] = U"a猫🍌";
+ wchar_t s5[] = L"a猫🍌";
+
+ setlocale(LC_ALL, "en_US.utf8");
+ printf(" \"%s\" is a char[%zu] holding { ", s1, sizeof s1 / sizeof *s1);
+ for(size_t n = 0; n &lt; sizeof s1 / sizeof *s1; ++n)
+ printf("0x%02X ", +(unsigned char)s1[n]);
+ puts("}");
+ printf(
+#if __STDC_VERSION__ &gt;= 202311L
+ "u8\"%s\" is a char8_t[%zu] holding { "
+#else
+ "u8\"%s\" is a char[%zu] holding { "
+#endif
+, s2, sizeof s2 / sizeof *s2);
+ for(size_t n = 0; n &lt; sizeof s2 / sizeof *s2; ++n)
+#if __STDC_VERSION__ &gt;= 202311L
+ printf("0x%02X ", s2[n]);
+#else
+ printf("0x%02X ", +(unsigned char)s2[n]);
+#endif
+ puts("}");
+ printf(" u\"a猫🍌\" is a char16_t[%zu] holding { ", sizeof s3 / sizeof *s3);
+ for(size_t n = 0; n &lt; sizeof s3 / sizeof *s3; ++n)
+ printf("0x%04" PRIXLEAST16" ", s3[n]);
+ puts("}");
+ printf(" U\"a猫🍌\" is a char32_t[%zu] holding { ", sizeof s4 / sizeof *s4);
+ for(size_t n = 0; n &lt; sizeof s4 / sizeof *s4; ++n)
+ printf("0x%08" PRIXLEAST32" ", s4[n]);
+ puts("}");
+ printf(" L\"%ls\" is a wchar_t[%zu] holding { ", s5, sizeof s5 / sizeof *s5);
+ for(size_t n = 0; n &lt; sizeof s5 / sizeof *s5; ++n)
+ printf("0x%08X ", (unsigned)s5[n]);
+ puts("}");
+}</pre></div> <p>Possible output:</p>
+<div class="text source-text"><pre data-language="c">
+ "a猫🍌" is a char[9] holding { 0x61 0xE7 0x8C 0xAB 0xF0 0x9F 0x8D 0x8C 0x00 }
+u8"a猫🍌" is a char[9] holding { 0x61 0xE7 0x8C 0xAB 0xF0 0x9F 0x8D 0x8C 0x00 }
+ u"a猫🍌" is a char16_t[5] holding { 0x0061 0x732B 0xD83C 0xDF4C 0x0000 }
+ U"a猫🍌" is a char32_t[4] holding { 0x00000061 0x0000732B 0x0001F34C 0x00000000 }
+ L"a猫🍌" is a wchar_t[4] holding { 0x00000061 0x0000732B 0x0001F34C 0x00000000 }</pre></div> </div> <h3 id="References"> References</h3> <ul>
+<li> C23 standard (ISO/IEC 9899:2023): </li>
+<ul><li> 6.4.5 String literals (p: TBD) </li></ul>
+<li> C17 standard (ISO/IEC 9899:2018): </li>
+<ul><li> 6.4.5 String literals (p: 50-52) </li></ul>
+<li> C11 standard (ISO/IEC 9899:2011): </li>
+<ul><li> 6.4.5 String literals (p: 70-72) </li></ul>
+<li> C99 standard (ISO/IEC 9899:1999): </li>
+<ul><li> 6.4.5 String literals (p: 62-63) </li></ul>
+<li> C89/C90 standard (ISO/IEC 9899:1990): </li>
+<ul><li> 3.1.4 String literals </li></ul>
+</ul> <h3 id="See_also"> See also</h3> <table class="t-dsc-begin"> <tr class="t-dsc"> <td colspan="2"> <span><a href="https://en.cppreference.com/w/cpp/language/string_literal" title="cpp/language/string literal">C++ documentation</a></span> for <span class=""><span>string literal</span></span> </td>
+</tr> </table> <div class="_attribution">
+ <p class="_attribution-p">
+ &copy; cppreference.com<br>Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.<br>
+ <a href="https://en.cppreference.com/w/c/language/string_literal" class="_attribution-link">https://en.cppreference.com/w/c/language/string_literal</a>
+ </p>
+</div>