diff options
Diffstat (limited to 'devdocs/elisp/converting-representations.html')
| -rw-r--r-- | devdocs/elisp/converting-representations.html | 18 |
1 files changed, 18 insertions, 0 deletions
diff --git a/devdocs/elisp/converting-representations.html b/devdocs/elisp/converting-representations.html new file mode 100644 index 00000000..1ef2afc0 --- /dev/null +++ b/devdocs/elisp/converting-representations.html @@ -0,0 +1,18 @@ + <h3 class="section">Converting Text Representations</h3> <p>Emacs can convert unibyte text to multibyte; it can also convert multibyte text to unibyte, provided that the multibyte text contains only <acronym>ASCII</acronym> and 8-bit raw bytes. In general, these conversions happen when inserting text into a buffer, or when putting text from several strings together in one string. You can also explicitly convert a string’s contents to either representation. </p> <p>Emacs chooses the representation for a string based on the text from which it is constructed. The general rule is to convert unibyte text to multibyte text when combining it with other multibyte text, because the multibyte representation is more general and can hold whatever characters the unibyte text has. </p> <p>When inserting text into a buffer, Emacs converts the text to the buffer’s representation, as specified by <code>enable-multibyte-characters</code> in that buffer. In particular, when you insert multibyte text into a unibyte buffer, Emacs converts the text to unibyte, even though this conversion cannot in general preserve all the characters that might be in the multibyte text. The other natural alternative, to convert the buffer contents to multibyte, is not acceptable because the buffer’s representation is a choice made by the user that cannot be overridden automatically. </p> <p>Converting unibyte text to multibyte text leaves <acronym>ASCII</acronym> characters unchanged, and converts bytes with codes 128 through 255 to the multibyte representation of raw eight-bit bytes. </p> <p>Converting multibyte text to unibyte converts all <acronym>ASCII</acronym> and eight-bit characters to their single-byte form, but loses information for non-<acronym>ASCII</acronym> characters by discarding all but the low 8 bits of each character’s codepoint. Converting unibyte text to multibyte and back to unibyte reproduces the original unibyte text. </p> <p>The next two functions either return the argument <var>string</var>, or a newly created string with no text properties. </p> <dl> <dt id="string-to-multibyte">Function: <strong>string-to-multibyte</strong> <em>string</em> +</dt> <dd><p>This function returns a multibyte string containing the same sequence of characters as <var>string</var>. If <var>string</var> is a multibyte string, it is returned unchanged. The function assumes that <var>string</var> includes only <acronym>ASCII</acronym> characters and raw 8-bit bytes; the latter are converted to their multibyte representation corresponding to the codepoints <code>#x3FFF80</code> through <code>#x3FFFFF</code>, inclusive (see <a href="text-representations">codepoints</a>). </p></dd> +</dl> <dl> <dt id="string-to-unibyte">Function: <strong>string-to-unibyte</strong> <em>string</em> +</dt> <dd><p>This function returns a unibyte string containing the same sequence of characters as <var>string</var>. If <var>string</var> is a unibyte string, it is returned unchanged. Otherwise, <acronym>ASCII</acronym> characters and characters in the <code>eight-bit</code> charset are converted to their corresponding byte values. Use this function for <var>string</var> arguments that contain only <acronym>ASCII</acronym> and eight-bit characters; the function signals an error if any other characters are encountered. </p></dd> +</dl> <dl> <dt id="byte-to-string">Function: <strong>byte-to-string</strong> <em>byte</em> +</dt> <dd> + <p>This function returns a unibyte string containing a single byte of character data, <var>byte</var>. It signals an error if <var>byte</var> is not an integer between 0 and 255. </p> +</dd> +</dl> <dl> <dt id="multibyte-char-to-unibyte">Function: <strong>multibyte-char-to-unibyte</strong> <em>char</em> +</dt> <dd><p>This converts the multibyte character <var>char</var> to a unibyte character, and returns that character. If <var>char</var> is neither <acronym>ASCII</acronym> nor eight-bit, the function returns -1. </p></dd> +</dl> <dl> <dt id="unibyte-char-to-multibyte">Function: <strong>unibyte-char-to-multibyte</strong> <em>char</em> +</dt> <dd><p>This converts the unibyte character <var>char</var> to a multibyte character, assuming <var>char</var> is either <acronym>ASCII</acronym> or raw 8-bit byte. </p></dd> +</dl><div class="_attribution"> + <p class="_attribution-p"> + Copyright © 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br> + <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Converting-Representations.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Converting-Representations.html</a> + </p> +</div> |
