Defined in header <stdlib.h> | ||
|---|---|---|
int mblen( const char* s, size_t n ); |
Determines the size, in bytes, of the multibyte character whose first byte is pointed to by s.
If s is a null pointer, resets the global conversion state and(until C23) determined whether shift sequences are used.
This function is equivalent to the call mbtowc((wchar_t*)0, s, n), except that conversion state of mbtowc is unaffected.
| s | - | pointer to the multibyte character |
| n | - | limit on the number of bytes in s that can be examined |
If s is not a null pointer, returns the number of bytes that are contained in the multibyte character or -1 if the first bytes pointed to by s do not form a valid multibyte character or 0 if s is pointing at the null charcter '\0'.
If s is a null pointer, resets its internal conversion state to represent the initial shift state and(until C23) returns 0 if the current multibyte encoding is not state-dependent (does not use shift sequences) or a non-zero value if the current multibyte encoding is state-dependent (uses shift sequences).
| Each call to | (until C23) |
|
| (since C23) |
#include <locale.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
// the number of characters in a multibyte string is the sum of mblen()'s
// note: the simpler approach is mbstowcs(NULL, str, sz)
size_t strlen_mb(const char* ptr)
{
size_t result = 0;
const char* end = ptr + strlen(ptr);
mblen(NULL, 0); // reset the conversion state
while(ptr < end) {
int next = mblen(ptr, end - ptr);
if (next == -1) {
perror("strlen_mb");
break;
}
ptr += next;
++result;
}
return result;
}
void dump_bytes(const char* str)
{
for (const char* end = str + strlen(str); str != end; ++str)
printf("%02X ", (unsigned char)str[0]);
printf("\n");
}
int main(void)
{
setlocale(LC_ALL, "en_US.utf8");
const char* str = "z\u00df\u6c34\U0001f34c";
printf("The string \"%s\" consists of %zu characters, but %zu bytes: ",
str, strlen_mb(str), strlen(str));
dump_bytes(str);
}Possible output:
The string "zß水🍌" consists of 4 characters, but 10 bytes: 7A C3 9F E6 B0 B4 F0 9F 8D 8C
| converts the next multibyte character to wide character (function) |
|
|
(C95) | returns the number of bytes in the next multibyte character, given state (function) |
C++ documentation for mblen |
|
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
https://en.cppreference.com/w/c/string/multibyte/mblen