summaryrefslogtreecommitdiff
path: root/devdocs/elisp/character-sets.html
diff options
context:
space:
mode:
Diffstat (limited to 'devdocs/elisp/character-sets.html')
-rw-r--r--devdocs/elisp/character-sets.html36
1 files changed, 36 insertions, 0 deletions
diff --git a/devdocs/elisp/character-sets.html b/devdocs/elisp/character-sets.html
new file mode 100644
index 00000000..a5ee0fcb
--- /dev/null
+++ b/devdocs/elisp/character-sets.html
@@ -0,0 +1,36 @@
+ <h3 class="section">Character Sets</h3> <p>An Emacs <em>character set</em>, or <em>charset</em>, is a set of characters in which each character is assigned a numeric code point. (The Unicode Standard calls this a <em>coded character set</em>.) Each Emacs charset has a name which is a symbol. A single character can belong to any number of different character sets, but it will generally have a different code point in each charset. Examples of character sets include <code>ascii</code>, <code>iso-8859-1</code>, <code>greek-iso8859-7</code>, and <code>windows-1255</code>. The code point assigned to a character in a charset is usually different from its code point used in Emacs buffers and strings. </p> <p>Emacs defines several special character sets. The character set <code>unicode</code> includes all the characters whose Emacs code points are in the range <code>0..#x10FFFF</code>. The character set <code>emacs</code> includes all <acronym>ASCII</acronym> and non-<acronym>ASCII</acronym> characters. Finally, the <code>eight-bit</code> charset includes the 8-bit raw bytes; Emacs uses it to represent raw bytes encountered in text. </p> <dl> <dt id="charsetp">Function: <strong>charsetp</strong> <em>object</em>
+</dt> <dd><p>Returns <code>t</code> if <var>object</var> is a symbol that names a character set, <code>nil</code> otherwise. </p></dd>
+</dl> <dl> <dt id="charset-list">Variable: <strong>charset-list</strong>
+</dt> <dd><p>The value is a list of all defined character set names. </p></dd>
+</dl> <dl> <dt id="charset-priority-list">Function: <strong>charset-priority-list</strong> <em>&amp;optional highestp</em>
+</dt> <dd><p>This function returns a list of all defined character sets ordered by their priority. If <var>highestp</var> is non-<code>nil</code>, the function returns a single character set of the highest priority. </p></dd>
+</dl> <dl> <dt id="set-charset-priority">Function: <strong>set-charset-priority</strong> <em>&amp;rest charsets</em>
+</dt> <dd><p>This function makes <var>charsets</var> the highest priority character sets. </p></dd>
+</dl> <dl> <dt id="char-charset">Function: <strong>char-charset</strong> <em>character &amp;optional restriction</em>
+</dt> <dd>
+<p>This function returns the name of the character set of highest priority that <var>character</var> belongs to. <acronym>ASCII</acronym> characters are an exception: for them, this function always returns <code>ascii</code>. </p> <p>If <var>restriction</var> is non-<code>nil</code>, it should be a list of charsets to search. Alternatively, it can be a coding system, in which case the returned charset must be supported by that coding system (see <a href="coding-systems">Coding Systems</a>). </p>
+</dd>
+</dl> <dl> <dt id="charset-plist">Function: <strong>charset-plist</strong> <em>charset</em>
+</dt> <dd><p>This function returns the property list of the character set <var>charset</var>. Although <var>charset</var> is a symbol, this is not the same as the property list of that symbol. Charset properties include important information about the charset, such as its documentation string, short name, etc. </p></dd>
+</dl> <dl> <dt id="put-charset-property">Function: <strong>put-charset-property</strong> <em>charset propname value</em>
+</dt> <dd><p>This function sets the <var>propname</var> property of <var>charset</var> to the given <var>value</var>. </p></dd>
+</dl> <dl> <dt id="get-charset-property">Function: <strong>get-charset-property</strong> <em>charset propname</em>
+</dt> <dd><p>This function returns the value of <var>charset</var>s property <var>propname</var>. </p></dd>
+</dl> <dl> <dt id="list-charset-chars">Command: <strong>list-charset-chars</strong> <em>charset</em>
+</dt> <dd><p>This command displays a list of characters in the character set <var>charset</var>. </p></dd>
+</dl> <p>Emacs can convert between its internal representation of a character and the character’s codepoint in a specific charset. The following two functions support these conversions. </p> <dl> <dt id="decode-char">Function: <strong>decode-char</strong> <em>charset code-point</em>
+</dt> <dd>
+<p>This function decodes a character that is assigned a <var>code-point</var> in <var>charset</var>, to the corresponding Emacs character, and returns it. If <var>charset</var> doesn’t contain a character of that code point, the value is <code>nil</code>. </p> <p>For backward compatibility, if <var>code-point</var> doesn’t fit in a Lisp fixnum (see <a href="integer-basics">most-positive-fixnum</a>), it can be specified as a cons cell <code>(<var>high</var> . <var>low</var>)</code>, where <var>low</var> are the lower 16 bits of the value and <var>high</var> are the high 16 bits. This usage is obsolescent. </p>
+</dd>
+</dl> <dl> <dt id="encode-char">Function: <strong>encode-char</strong> <em>char charset</em>
+</dt> <dd><p>This function returns the code point assigned to the character <var>char</var> in <var>charset</var>. If <var>charset</var> doesn’t have a codepoint for <var>char</var>, the value is <code>nil</code>. </p></dd>
+</dl> <p>The following function comes in handy for applying a certain function to all or part of the characters in a charset: </p> <dl> <dt id="map-charset-chars">Function: <strong>map-charset-chars</strong> <em>function charset &amp;optional arg from-code to-code</em>
+</dt> <dd>
+<p>Call <var>function</var> for characters in <var>charset</var>. <var>function</var> is called with two arguments. The first one is a cons cell <code>(<var>from</var> . <var>to</var>)</code>, where <var>from</var> and <var>to</var> indicate a range of characters contained in <var>charset</var>. The second argument passed to <var>function</var> is <var>arg</var>, or <code>nil</code> if <var>arg</var> is omitted. </p> <p>By default, the range of codepoints passed to <var>function</var> includes all the characters in <var>charset</var>, but optional arguments <var>from-code</var> and <var>to-code</var> limit that to the range of characters between these two codepoints of <var>charset</var>. If either of them is <code>nil</code>, it defaults to the first or last codepoint of <var>charset</var>, respectively. Note that <var>from-code</var> and <var>to-code</var> are <var>charset</var>’s codepoints, not the Emacs codes of characters; by contrast, the values <var>from</var> and <var>to</var> in the cons cell passed to <var>function</var> <em>are</em> Emacs character codes. Those Emacs character codes are either Unicode code points, or Emacs internal code points that extend Unicode and are beyond the Unicode range of characters <code>0..#x10FFFF</code> (see <a href="text-representations">Text Representations</a>). The latter happens rarely, with legacy CJK charsets for codepoints of <var>charset</var> which specify characters not yet unified with Unicode. </p>
+</dd>
+</dl><div class="_attribution">
+ <p class="_attribution-p">
+ Copyright &copy; 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br>
+ <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Sets.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Character-Sets.html</a>
+ </p>
+</div>