devdocs/elisp/character-codes.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

 <h3 class="section">Character Codes</h3>  <p>The unibyte and multibyte text representations use different character codes. The valid character codes for unibyte representation range from 0 to <code>#xFF</code> (255)—the values that can fit in one byte. The valid character codes for multibyte representation range from 0 to <code>#x3FFFFF</code>. In this code space, values 0 through <code>#x7F</code> (127) are for <acronym>ASCII</acronym> characters, and values <code>#x80</code> (128) through <code>#x3FFF7F</code> (4194175) are for non-<acronym>ASCII</acronym> characters. </p> <p>Emacs character codes are a superset of the Unicode standard. Values 0 through <code>#x10FFFF</code> (1114111) correspond to Unicode characters of the same codepoint; values <code>#x110000</code> (1114112) through <code>#x3FFF7F</code> (4194175) represent characters that are not unified with Unicode; and values <code>#x3FFF80</code> (4194176) through <code>#x3FFFFF</code> (4194303) represent eight-bit raw bytes. </p> <dl> <dt id="characterp">Function: <strong>characterp</strong> <em>charcode</em>
</dt> <dd>
<p>This returns <code>t</code> if <var>charcode</var> is a valid character, and <code>nil</code> otherwise. </p> <div class="example"> <pre class="example">(characterp 65)
     ⇒ t
</pre>
<pre class="example">(characterp 4194303)
     ⇒ t
</pre>
<pre class="example">(characterp 4194304)
     ⇒ nil
</pre>
</div> </dd>
</dl>   <dl> <dt id="max-char">Function: <strong>max-char</strong>
</dt> <dd>
<p>This function returns the largest value that a valid character codepoint can have. </p> <div class="example"> <pre class="example">(characterp (max-char))
     ⇒ t
</pre>
<pre class="example">(characterp (1+ (max-char)))
     ⇒ nil
</pre>
</div> </dd>
</dl> <dl> <dt id="char-from-name">Function: <strong>char-from-name</strong> <em>string &amp;optional ignore-case</em>
</dt> <dd>
<p>This function returns the character whose Unicode name is <var>string</var>. If <var>ignore-case</var> is non-<code>nil</code>, case is ignored in <var>string</var>. This function returns <code>nil</code> if <var>string</var> does not name a character. </p> <div class="example"> <pre class="example">;; U+03A3
(= (char-from-name "GREEK CAPITAL LETTER SIGMA") #x03A3)
     ⇒ t
</pre>
</div> </dd>
</dl> <dl> <dt id="get-byte">Function: <strong>get-byte</strong> <em>&amp;optional pos string</em>
</dt> <dd>
<p>This function returns the byte at character position <var>pos</var> in the current buffer. If the current buffer is unibyte, this is literally the byte at that position. If the buffer is multibyte, byte values of <acronym>ASCII</acronym> characters are the same as character codepoints, whereas eight-bit raw bytes are converted to their 8-bit codes. The function signals an error if the character at <var>pos</var> is non-<acronym>ASCII</acronym>. </p> <p>The optional argument <var>string</var> means to get a byte value from that string instead of the current buffer. </p>
</dd>
</dl><div class="_attribution">
  <p class="_attribution-p">
    Copyright &copy; 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br>
    <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Codes.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Codes.html</a>
  </p>
</div>