'c-char ' | (1) | |
u8'c-char ' | (2) | (since C23) |
u'c-char ' | (3) | (since C11) |
U'c-char ' | (4) | (since C11) |
L'c-char ' | (5) | |
'c-char-sequence ' | (6) | |
L'c-char-sequence ' | (7) | |
u'c-char-sequence ' | (8) | (since C11)(removed in C23) |
U'c-char-sequence ' | (9) | (since C11)(removed in C23) |
where
'), backslash (\), or the newline character. \' \" \? \\ \a \b \f \n \r \t \v, hex escapes \x... or octal escapes \... as defined in escape sequences.
| (since C99) |
'a' or '\n' or '\13'. Such constant has type int and a value equal to the representation of c-char in the execution character set as a value of type char mapped to int. If c-char is not representable as a single byte in the execution character set, the value is implementation-defined.u8'a'. Such constant has type char8_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-8 code unit (that is, c-char is in the range 0x0-0x7F, inclusive). If c-char is not representable with a single UTF-8 code unit, the program is ill-formed. 3) 16-bit wide character constant, e.g. u'่ฒ', but not u'๐' (u'\U0001f34c'). Such constant has type char16_t and a value equal to the value of c-char in the 16-bit encoding produced by mbrtoc16 (normally UTF-16). If c-char is not representable or maps to more than one 16-bit character, the value is implementation-defined. 4) 32-bit wide character constant, e.g. U'่ฒ' or U'๐'. Such constant has type char32_t and a value equal to the value of c-char in in the 32-bit encoding produced by mbrtoc32 (normally UTF-32). If c-char is not representable or maps to more than one 32-bit character, the value is implementation-defined. | (until C23) |
3) UTF-16 character constant, e.g. u'่ฒ', but not u'๐' (u'\U0001f34c'). Such constant has type char16_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-16 code unit (that is, c-char is in the range 0x0-0xD7FF or 0xE000-0xFFFF, inclusive). If c-char is not representable with a single UTF-16 code unit, the program is ill-formed. 4) UTF-32 character constant, e.g. U'่ฒ' or U'๐'. Such constant has type char32_t and the value equal to ISO 10646 code point value of c-char, provided that the code point value is representable with a single UTF-32 code unit (that is, c-char is in the range 0x0-0xD7FF or 0xE000-0x10FFFF, inclusive). If c-char is not representable with a single UTF-32 code unit, the program is ill-formed. | (since C23) |
L'ฮฒ' or L'่ฒ. Such constant has type wchar_t and a value equal to the value of c-char in the execution wide character set (that is, the value that would be produced by mbtowc). If c-char is not representable or maps to more than one wide character (e.g. a non-BMP value on Windows where wchar_t is 16-bit), the value is implementation-defined .'AB', has type int and implementation-defined value.L'AB', has type wchar_t and implementation-defined value.u'CD', has type char16_t and implementation-defined value.U'XY', has type char32_t and implementation-defined value.Multicharacter constants were inherited by C from the B programming language. Although not specified by the C standard, most compilers (MSVC is a notable exception) implement multicharacter constants as specified in B: the values of each char in the constant initialize successive bytes of the resulting integer, in big-endian zero-padded right-adjusted order, e.g. the value of '\1' is 0x00000001 and the value of '\1\2\3\4' is 0x01020304.
In C++, encodable ordinary character literals have type char, rather than int.
Unlike integer constants, a character constant may have a negative value if char is signed: on such implementations '\xFF' is an int with the value -1.
When used in a controlling expression of #if or #elif, character constants may be interpreted in terms of the source character set, the execution character set, or some other implementation-defined character set.
16/32-bit multicharacter constants are not widely supported and removed in C23. Some common implementations (e.g. clang) do not accept them at all.
#include <stddef.h>
#include <stdio.h>
#include <uchar.h>
int main (void)
{
printf("constant value \n");
printf("-------- ----------\n");
// integer character constants,
int c1='a'; printf("'a':\t %#010x\n", c1);
int c2='๐'; printf("'๐':\t %#010x\n\n", c2); // implementation-defined
// multicharacter constant
int c3='ab'; printf("'ab':\t %#010x\n\n", c3); // implementation-defined
// 16-bit wide character constants
char16_t uc1 = u'a'; printf("'a':\t %#010x\n", (int)uc1);
char16_t uc2 = u'ยข'; printf("'ยข':\t %#010x\n", (int)uc2);
char16_t uc3 = u'็ซ'; printf("'็ซ':\t %#010x\n", (int)uc3);
// implementation-defined (๐ maps to two 16-bit characters)
char16_t uc4 = u'๐'; printf("'๐':\t %#010x\n\n", (int)uc4);
// 32-bit wide character constants
char32_t Uc1 = U'a'; printf("'a':\t %#010x\n", (int)Uc1);
char32_t Uc2 = U'ยข'; printf("'ยข':\t %#010x\n", (int)Uc2);
char32_t Uc3 = U'็ซ'; printf("'็ซ':\t %#010x\n", (int)Uc3);
char32_t Uc4 = U'๐'; printf("'๐':\t %#010x\n\n", (int)Uc4);
// wide character constants
wchar_t wc1 = L'a'; printf("'a':\t %#010x\n", (int)wc1);
wchar_t wc2 = L'ยข'; printf("'ยข':\t %#010x\n", (int)wc2);
wchar_t wc3 = L'็ซ'; printf("'็ซ':\t %#010x\n", (int)wc3);
wchar_t wc4 = L'๐'; printf("'๐':\t %#010x\n\n", (int)wc4);
}Possible output:
constant value -------- ---------- 'a': 0x00000061 '๐': 0xf09f8d8c 'ab': 0x00006162 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0000df4c 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0001f34c 'a': 0x00000061 'ยข': 0x000000a2 '็ซ': 0x0000732b '๐': 0x0001f34c
| C++ documentation for Character literal |
© cppreference.com
Licensed under the Creative Commons Attribution-ShareAlike Unported License v3.0.
https://en.cppreference.com/w/c/language/character_constant