diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/c/string%2Fmultibyte%2Fmblen.html | |
new repository
Diffstat (limited to 'devdocs/c/string%2Fmultibyte%2Fmblen.html')
| -rw-r--r-- | devdocs/c/string%2Fmultibyte%2Fmblen.html | 73 |
1 files changed, 73 insertions, 0 deletions
diff --git a/devdocs/c/string%2Fmultibyte%2Fmblen.html b/devdocs/c/string%2Fmultibyte%2Fmblen.html new file mode 100644 index 00000000..46facf2a --- /dev/null +++ b/devdocs/c/string%2Fmultibyte%2Fmblen.html @@ -0,0 +1,73 @@ + <h1 id="firstHeading" class="firstHeading">mblen</h1> <table class="t-dcl-begin"> <tr class="t-dsc-header"> <th> Defined in header <code><stdlib.h></code> </th> <th> </th> <th> </th> </tr> <tr class="t-dcl"> <td class="t-dcl-nopad"> <pre data-language="c">int mblen( const char* s, size_t n );</pre> +</td> <td class="t-dcl-nopad"> </td> <td class="t-dcl-nopad"> </td> </tr> </table> <p>Determines the size, in bytes, of the multibyte character whose first byte is pointed to by <code>s</code>.</p> +<p>If <code>s</code> is a null pointer, <span class="t-rev-inl t-until-c23"><span>resets the global conversion state and</span><span><span class="t-mark-rev t-until-c23">(until C23)</span></span></span> determined whether shift sequences are used.</p> +<p>This function is equivalent to the call <code><a href="http://en.cppreference.com/w/c/string/multibyte/mbtowc"><span class="kw575">mbtowc</span></a><span class="br0">(</span><span class="br0">(</span><span class="kw4">wchar_t</span><span class="sy2">*</span><span class="br0">)</span><span class="nu0">0</span>, s, n<span class="br0">)</span></code>, except that conversion state of <code><a href="mbtowc" title="c/string/multibyte/mbtowc">mbtowc</a></code> is unaffected.</p> +<h3 id="Parameters"> Parameters</h3> <table class="t-par-begin"> <tr class="t-par"> <td> s </td> <td> - </td> <td> pointer to the multibyte character </td> +</tr> <tr class="t-par"> <td> n </td> <td> - </td> <td> limit on the number of bytes in s that can be examined </td> +</tr> +</table> <h3 id="Return_value"> Return value</h3> <p>If <code>s</code> is not a null pointer, returns the number of bytes that are contained in the multibyte character or <code>-1</code> if the first bytes pointed to by <code>s</code> do not form a valid multibyte character or <code>0</code> if <code>s</code> is pointing at the null charcter <code>'\0'</code>.</p> +<p>If <code>s</code> is a null pointer, <span class="t-rev-inl t-until-c23"><span>resets its internal conversion state to represent the initial shift state and</span><span><span class="t-mark-rev t-until-c23">(until C23)</span></span></span> returns <code>0</code> if the current multibyte encoding is not state-dependent (does not use shift sequences) or a non-zero value if the current multibyte encoding is state-dependent (uses shift sequences).</p> +<h3 id="Notes"> Notes</h3> <table class="t-rev-begin"> <tr class="t-rev t-until-c23"> +<td> <p>Each call to <code>mblen</code> updates the internal global conversion state (a static object of type <code><a href="mbstate_t" title="c/string/multibyte/mbstate t">mbstate_t</a></code>, only known to this function). If the multibyte encoding uses shift states, care must be taken to avoid backtracking or multiple scans. In any case, multiple threads should not call <code>mblen</code> without synchronization: <code><a href="mbrlen" title="c/string/multibyte/mbrlen">mbrlen</a></code> may be used instead.</p> +</td> <td><span class="t-mark-rev t-until-c23">(until C23)</span></td> +</tr> <tr class="t-rev t-since-c23"> +<td> <p><code>mblen</code> is not allowed to have an internal state.</p> +</td> <td><span class="t-mark-rev t-since-c23">(since C23)</span></td> +</tr> </table> <h3 id="Example"> Example</h3> <div class="t-example"> <div class="c source-c"><pre data-language="c">#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +// the number of characters in a multibyte string is the sum of mblen()'s +// note: the simpler approach is mbstowcs(NULL, str, sz) +size_t strlen_mb(const char* ptr) +{ + size_t result = 0; + const char* end = ptr + strlen(ptr); + mblen(NULL, 0); // reset the conversion state + while(ptr < end) { + int next = mblen(ptr, end - ptr); + if (next == -1) { + perror("strlen_mb"); + break; + } + ptr += next; + ++result; + } + return result; +} + +void dump_bytes(const char* str) +{ + for (const char* end = str + strlen(str); str != end; ++str) + printf("%02X ", (unsigned char)str[0]); + printf("\n"); +} + +int main(void) +{ + setlocale(LC_ALL, "en_US.utf8"); + const char* str = "z\u00df\u6c34\U0001f34c"; + printf("The string \"%s\" consists of %zu characters, but %zu bytes: ", + str, strlen_mb(str), strlen(str)); + dump_bytes(str); +}</pre></div> <p>Possible output:</p> +<div class="text source-text"><pre data-language="c">The string "zß水🍌" consists of 4 characters, but 10 bytes: 7A C3 9F E6 B0 B4 F0 9F 8D 8C</pre></div> </div> <h3 id="References"> References</h3> <ul> +<li> C17 standard (ISO/IEC 9899:2018): </li> +<ul><li> 7.22.7.1 The mblen function (p: 260) </li></ul> +<li> C11 standard (ISO/IEC 9899:2011): </li> +<ul><li> 7.22.7.1 The mblen function (p: 357) </li></ul> +<li> C99 standard (ISO/IEC 9899:1999): </li> +<ul><li> 7.20.7.1 The mblen function (p: 321) </li></ul> +<li> C89/C90 standard (ISO/IEC 9899:1990): </li> +<ul><li> 4.10.7.1 The mblen function </li></ul> +</ul> <h3 id="See_also"> See also</h3> <table class="t-dsc-begin"> <tr class="t-dsc"> <td> <div><a href="mbtowc" title="c/string/multibyte/mbtowc"> <span class="t-lines"><span>mbtowc</span></span></a></div> </td> <td> converts the next multibyte character to wide character <br> <span class="t-mark">(function)</span> </td> +</tr> <tr class="t-dsc"> <td> <div><a href="mbrlen" title="c/string/multibyte/mbrlen"> <span class="t-lines"><span>mbrlen</span></span></a></div> +<div><span class="t-lines"><span><span class="t-mark-rev t-since-c95">(C95)</span></span></span></div> </td> <td> returns the number of bytes in the next multibyte character, given state <br> <span class="t-mark">(function)</span> </td> +</tr> <tr class="t-dsc"> <td colspan="2"> <span><a href="https://en.cppreference.com/w/cpp/string/multibyte/mblen" title="cpp/string/multibyte/mblen">C++ documentation</a></span> for <code>mblen</code> </td> +</tr> </table> <div class="_attribution"> + <p class="_attribution-p"> + © cppreference.com<br>Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.<br> + <a href="https://en.cppreference.com/w/c/string/multibyte/mblen" class="_attribution-link">https://en.cppreference.com/w/c/string/multibyte/mblen</a> + </p> +</div> |
