diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/c/string%2Fmultibyte%2Fc16rtomb.html | |
new repository
Diffstat (limited to 'devdocs/c/string%2Fmultibyte%2Fc16rtomb.html')
| -rw-r--r-- | devdocs/c/string%2Fmultibyte%2Fc16rtomb.html | 67 |
1 files changed, 67 insertions, 0 deletions
diff --git a/devdocs/c/string%2Fmultibyte%2Fc16rtomb.html b/devdocs/c/string%2Fmultibyte%2Fc16rtomb.html new file mode 100644 index 00000000..5f1da1ad --- /dev/null +++ b/devdocs/c/string%2Fmultibyte%2Fc16rtomb.html @@ -0,0 +1,67 @@ + <h1 id="firstHeading" class="firstHeading">c16rtomb</h1> <table class="t-dcl-begin"> <tr class="t-dsc-header"> <th> Defined in header <code><uchar.h></code> </th> <th> </th> <th> </th> </tr> <tr class="t-dcl t-since-c11"> <td> <pre data-language="c">size_t c16rtomb( char* restrict s, char16_t c16, mbstate_t* restrict ps );</pre> +</td> <td class="t-dcl-nopad"> </td> <td> <span class="t-mark-rev t-since-c11">(since C11)</span> </td> </tr> </table> <p>Converts a single code point from its variable-length 16-bit wide character representation (typically, UTF-16) to its narrow multibyte character representation.</p> +<p>If <code>s</code> is not a null pointer and <code>c16</code> is the last 16-bit code unit in a valid variable-length encoding of a code point, the function determines the number of bytes necessary to store the multibyte character representation of that code point (including any shift sequences, and taking into account the current multibyte conversion state <code>*ps</code>), and stores the multibyte character representation in the character array whose first element is pointed to by <code>s</code>, updating <code>*ps</code> as necessary. At most <code>MB_CUR_MAX</code> bytes can be written by this function.</p> +<p>If <code>s</code> is a null pointer, the call is equivalent to <code>c16rtomb(buf, u'\0', ps)</code> for some internal buffer <code>buf</code>.</p> +<p>If <code>c16</code> is the null wide character <code>u'\0'</code>, a null byte is stored, preceded by any shift sequence necessary to restore the initial shift state and the conversion state parameter <code>*ps</code> is updated to represent the initial shift state.</p> +<p>If <code>c16</code> is not the final code unit in a 16-bit representation of a wide character, it does not write to the array pointed to by <code>s</code>, only <code>*ps</code> is updated.</p> +<p>If the macro <code>__STDC_UTF_16__</code> is defined, the 16-bit encoding used by this function is UTF-16; otherwise, it is implementation-defined. <span class="t-rev-inl t-since-c23"><span>The macro is always defined and the encoding is always UTF-16.</span><span><span class="t-mark-rev t-since-c23">(since C23)</span></span></span> In any case, the multibyte character encoding used by this function is specified by the currently active C locale.</p> +<h3 id="Parameters"> Parameters</h3> <table class="t-par-begin"> <tr class="t-par"> <td> s </td> <td> - </td> <td> pointer to narrow character array where the multibyte character will be stored </td> +</tr> <tr class="t-par"> <td> c16 </td> <td> - </td> <td> the 16-bit wide character to convert </td> +</tr> <tr class="t-par"> <td> ps </td> <td> - </td> <td> pointer to the conversion state object used when interpreting the multibyte string </td> +</tr> +</table> <h3 id="Return_value"> Return value</h3> <p>On success, returns the number of bytes (including any shift sequences) written to the character array whose first element is pointed to by <code>s</code>. This value may be <code>â0â</code>, e.g. when processing the leading <code>char16_t</code> units in a multi-<code>char16_t</code>-unit sequence (occurs when processing the leading surrogate in a surrogate pair of UTF-16).</p> +<p>On failure (if <code>c16</code> is not a valid 16-bit code unit), returns <code>-1</code>, stores <code><a href="../../error/errno_macros" title="c/error/errno macros">EILSEQ</a></code> in <code><a href="../../error/errno" title="c/error/errno">errno</a></code>, and leaves <code>*ps</code> in unspecified state.</p> +<h3 id="Notes"> Notes</h3> <p>In C11 as published, unlike <code><a href="mbrtoc16" title="c/string/multibyte/mbrtoc16">mbrtoc16</a></code>, which converts variable-width multibyte (such as UTF-8) to variable-width 16-bit (such as UTF-16) encoding, this function can only convert single-unit 16-bit encoding, meaning it cannot convert UTF-16 to UTF-8 despite that being the original intent of this function. This was corrected by the post-C11 defect report <a rel="nofollow" class="external text" href="https://open-std.org/JTC1/SC22/WG14/www/docs/n2059.htm#dr_488">DR488</a>.</p> +<h3 id="Example"> Example</h3> <div class="t-example"> +<p>Note: this example assumes the fix for the defect report <a rel="nofollow" class="external text" href="https://open-std.org/JTC1/SC22/WG14/www/docs/n2059.htm#dr_488">DR488</a> is applied</p> +<div class="c source-c"><pre data-language="c">#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <uchar.h> + +mbstate_t state; + +int main(void) +{ + setlocale(LC_ALL, "en_US.utf8"); + const char16_t in[] = u"zĂć°´đ"; // or "z\u00df\u6c34\U0001F34C" + const size_t in_sz = sizeof in / sizeof *in; + + printf("Processing %zu UTF-16 code units: [ ", in_sz); + for (size_t n = 0; n < in_sz; ++n) + printf("%#x ", in[n]); + puts("]"); + + char out[MB_CUR_MAX * in_sz]; + char *p = out; + for (size_t n = 0; n < in_sz; ++n) + { + size_t rc = c16rtomb(p, in[n], &state); + if (rc == (size_t)-1) + break; + p += rc; + } + + size_t out_sz = p - out; + printf("into %zu UTF-8 code units: [ ", out_sz); + for (size_t x = 0; x < out_sz; ++x) + printf("%#x ", +(unsigned char)out[x]); + puts("]"); +}</pre></div> <p>Output:</p> +<div class="text source-text"><pre data-language="c">Processing 6 UTF-16 code units: [ 0x7a 0xdf 0x6c34 0xd83c 0xdf4c 0 ] +into 11 UTF-8 code units: [ 0x7a 0xc3 0x9f 0xe6 0xb0 0xb4 0xf0 0x9f 0x8d 0x8c 0 ]</pre></div> </div> <h3 id="References"> References</h3> <ul> +<li> C23 standard (ISO/IEC 9899:2023): </li> +<ul><li> 7.28.1.2 The c16rtomb function (p: TBD) </li></ul> +<li> C17 standard (ISO/IEC 9899:2018): </li> +<ul><li> 7.28.1.2 The c16rtomb function (p: TBD) </li></ul> +<li> C11 standard (ISO/IEC 9899:2011): </li> +<ul><li> 7.28.1.2 The c16rtomb function (p: 399-400) </li></ul> +</ul> <h3 id="See_also"> See also</h3> <table class="t-dsc-begin"> <tr class="t-dsc"> <td> <div><a href="mbrtoc16" title="c/string/multibyte/mbrtoc16"> <span class="t-lines"><span>mbrtoc16</span></span></a></div> +<div><span class="t-lines"><span><span class="t-mark-rev t-since-c11">(C11)</span></span></span></div> </td> <td> generates the next 16-bit wide character from a narrow multibyte string <br> <span class="t-mark">(function)</span> </td> +</tr> <tr class="t-dsc"> <td colspan="2"> <span><a href="https://en.cppreference.com/w/cpp/string/multibyte/c16rtomb" title="cpp/string/multibyte/c16rtomb">C++ documentation</a></span> for <code>c16rtomb</code> </td> +</tr> </table> <div class="_attribution"> + <p class="_attribution-p"> + © cppreference.com<br>Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.<br> + <a href="https://en.cppreference.com/w/c/string/multibyte/c16rtomb" class="_attribution-link">https://en.cppreference.com/w/c/string/multibyte/c16rtomb</a> + </p> +</div> |
