diff options
| author | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
|---|---|---|
| committer | Craig Jennings <c@cjennings.net> | 2024-04-07 13:41:34 -0500 |
| commit | 754bbf7a25a8dda49b5d08ef0d0443bbf5af0e36 (patch) | |
| tree | f1190704f78f04a2b0b4c977d20fe96a828377f1 /devdocs/elisp/text-comparison.html | |
new repository
Diffstat (limited to 'devdocs/elisp/text-comparison.html')
| -rw-r--r-- | devdocs/elisp/text-comparison.html | 90 |
1 files changed, 90 insertions, 0 deletions
diff --git a/devdocs/elisp/text-comparison.html b/devdocs/elisp/text-comparison.html new file mode 100644 index 00000000..073f1978 --- /dev/null +++ b/devdocs/elisp/text-comparison.html @@ -0,0 +1,90 @@ + <h3 class="section">Comparison of Characters and Strings</h3> <dl> <dt id="char-equal">Function: <strong>char-equal</strong> <em>character1 character2</em> +</dt> <dd> +<p>This function returns <code>t</code> if the arguments represent the same character, <code>nil</code> otherwise. This function ignores differences in case if <code>case-fold-search</code> is non-<code>nil</code>. </p> <div class="example"> <pre class="example">(char-equal ?x ?x) + ⇒ t +(let ((case-fold-search nil)) + (char-equal ?x ?X)) + ⇒ nil +</pre> +</div> </dd> +</dl> <dl> <dt id="string=">Function: <strong>string=</strong> <em>string1 string2</em> +</dt> <dd> +<p>This function returns <code>t</code> if the characters of the two strings match exactly. Symbols are also allowed as arguments, in which case the symbol names are used. Case is always significant, regardless of <code>case-fold-search</code>. </p> <p>This function is equivalent to <code>equal</code> for comparing two strings (see <a href="equality-predicates">Equality Predicates</a>). In particular, the text properties of the two strings are ignored; use <code>equal-including-properties</code> if you need to distinguish between strings that differ only in their text properties. However, unlike <code>equal</code>, if either argument is not a string or symbol, <code>string=</code> signals an error. </p> <div class="example"> <pre class="example">(string= "abc" "abc") + ⇒ t +(string= "abc" "ABC") + ⇒ nil +(string= "ab" "ABC") + ⇒ nil +</pre> +</div> <p>For technical reasons, a unibyte and a multibyte string are <code>equal</code> if and only if they contain the same sequence of character codes and all these codes are either in the range 0 through 127 (<acronym>ASCII</acronym>) or 160 through 255 (<code>eight-bit-graphic</code>). However, when a unibyte string is converted to a multibyte string, all characters with codes in the range 160 through 255 are converted to characters with higher codes, whereas <acronym>ASCII</acronym> characters remain unchanged. Thus, a unibyte string and its conversion to multibyte are only <code>equal</code> if the string is all <acronym>ASCII</acronym>. Character codes 160 through 255 are not entirely proper in multibyte text, even though they can occur. As a consequence, the situation where a unibyte and a multibyte string are <code>equal</code> without both being all <acronym>ASCII</acronym> is a technical oddity that very few Emacs Lisp programmers ever get confronted with. See <a href="text-representations">Text Representations</a>. </p> +</dd> +</dl> <dl> <dt id="string-equal">Function: <strong>string-equal</strong> <em>string1 string2</em> +</dt> <dd><p><code>string-equal</code> is another name for <code>string=</code>. </p></dd> +</dl> <dl> <dt id="string-collate-equalp">Function: <strong>string-collate-equalp</strong> <em>string1 string2 &optional locale ignore-case</em> +</dt> <dd> +<p>This function returns <code>t</code> if <var>string1</var> and <var>string2</var> are equal with respect to collation rules. A collation rule is not only determined by the lexicographic order of the characters contained in <var>string1</var> and <var>string2</var>, but also further rules about relations between these characters. Usually, it is defined by the <var>locale</var> environment Emacs is running with and by the Standard C library against which Emacs was linked<a id="DOCF3" href="#FOOT3"><sup>3</sup></a>. </p> <p>For example, characters with different code points but the same meaning, like different grave accent Unicode characters, might, in some locales, be considered as equal: </p> <div class="example"> <pre class="example">(string-collate-equalp (string ?\uFF40) (string ?\u1FEF)) + ⇒ t +</pre> +</div> <p>The optional argument <var>locale</var>, a string, overrides the setting of your current locale identifier for collation. The value is system dependent; a <var>locale</var> <code>"en_US.UTF-8"</code> is applicable on POSIX systems, while it would be, e.g., <code>"enu_USA.1252"</code> on MS-Windows systems. </p> <p>If <var>ignore-case</var> is non-<code>nil</code>, characters are converted to lower-case before comparing them. </p> <p>To emulate Unicode-compliant collation on MS-Windows systems, bind <code>w32-collate-ignore-punctuation</code> to a non-<code>nil</code> value, since the codeset part of the locale cannot be <code>"UTF-8"</code> on MS-Windows. </p> <p>If your system does not support a locale environment, this function behaves like <code>string-equal</code>. </p> <p>Do <em>not</em> use this function to compare file names for equality, as filesystems generally don’t honor linguistic equivalence of strings that collation implements. </p> +</dd> +</dl> <dl> <dt id="string<">Function: <strong>string<</strong> <em>string1 string2</em> +</dt> <dd> +<p>This function compares two strings a character at a time. It scans both the strings at the same time to find the first pair of corresponding characters that do not match. If the lesser character of these two is the character from <var>string1</var>, then <var>string1</var> is less, and this function returns <code>t</code>. If the lesser character is the one from <var>string2</var>, then <var>string1</var> is greater, and this function returns <code>nil</code>. If the two strings match entirely, the value is <code>nil</code>. </p> <p>Pairs of characters are compared according to their character codes. Keep in mind that lower case letters have higher numeric values in the <acronym>ASCII</acronym> character set than their upper case counterparts; digits and many punctuation characters have a lower numeric value than upper case letters. An <acronym>ASCII</acronym> character is less than any non-<acronym>ASCII</acronym> character; a unibyte non-<acronym>ASCII</acronym> character is always less than any multibyte non-<acronym>ASCII</acronym> character (see <a href="text-representations">Text Representations</a>). </p> <div class="example"> <pre class="example">(string< "abc" "abd") + ⇒ t +(string< "abd" "abc") + ⇒ nil +(string< "123" "abc") + ⇒ t +</pre> +</div> <p>When the strings have different lengths, and they match up to the length of <var>string1</var>, then the result is <code>t</code>. If they match up to the length of <var>string2</var>, the result is <code>nil</code>. A string of no characters is less than any other string. </p> <div class="example"> <pre class="example">(string< "" "abc") + ⇒ t +(string< "ab" "abc") + ⇒ t +(string< "abc" "") + ⇒ nil +(string< "abc" "ab") + ⇒ nil +(string< "" "") + ⇒ nil +</pre> +</div> <p>Symbols are also allowed as arguments, in which case their print names are compared. </p> +</dd> +</dl> <dl> <dt id="string-lessp">Function: <strong>string-lessp</strong> <em>string1 string2</em> +</dt> <dd><p><code>string-lessp</code> is another name for <code>string<</code>. </p></dd> +</dl> <dl> <dt id="string-greaterp">Function: <strong>string-greaterp</strong> <em>string1 string2</em> +</dt> <dd><p>This function returns the result of comparing <var>string1</var> and <var>string2</var> in the opposite order, i.e., it is equivalent to calling <code>(string-lessp <var>string2</var> <var>string1</var>)</code>. </p></dd> +</dl> <dl> <dt id="string-collate-lessp">Function: <strong>string-collate-lessp</strong> <em>string1 string2 &optional locale ignore-case</em> +</dt> <dd> +<p>This function returns <code>t</code> if <var>string1</var> is less than <var>string2</var> in collation order. A collation order is not only determined by the lexicographic order of the characters contained in <var>string1</var> and <var>string2</var>, but also further rules about relations between these characters. Usually, it is defined by the <var>locale</var> environment Emacs is running with. </p> <p>For example, punctuation and whitespace characters might be ignored for sorting (see <a href="sequence-functions">Sequence Functions</a>): </p> <div class="example"> <pre class="example">(sort (list "11" "12" "1 1" "1 2" "1.1" "1.2") 'string-collate-lessp) + ⇒ ("11" "1 1" "1.1" "12" "1 2" "1.2") +</pre> +</div> <p>This behavior is system-dependent; e.g., punctuation and whitespace are never ignored on Cygwin, regardless of locale. </p> <p>The optional argument <var>locale</var>, a string, overrides the setting of your current locale identifier for collation. The value is system dependent; a <var>locale</var> <code>"en_US.UTF-8"</code> is applicable on POSIX systems, while it would be, e.g., <code>"enu_USA.1252"</code> on MS-Windows systems. The <var>locale</var> value of <code>"POSIX"</code> or <code>"C"</code> lets <code>string-collate-lessp</code> behave like <code>string-lessp</code>: </p> <div class="example"> <pre class="example">(sort (list "11" "12" "1 1" "1 2" "1.1" "1.2") + (lambda (s1 s2) (string-collate-lessp s1 s2 "POSIX"))) + ⇒ ("1 1" "1 2" "1.1" "1.2" "11" "12") +</pre> +</div> <p>If <var>ignore-case</var> is non-<code>nil</code>, characters are converted to lower-case before comparing them. </p> <p>To emulate Unicode-compliant collation on MS-Windows systems, bind <code>w32-collate-ignore-punctuation</code> to a non-<code>nil</code> value, since the codeset part of the locale cannot be <code>"UTF-8"</code> on MS-Windows. </p> <p>If your system does not support a locale environment, this function behaves like <code>string-lessp</code>. </p> +</dd> +</dl> <dl> <dt id="string-version-lessp">Function: <strong>string-version-lessp</strong> <em>string1 string2</em> +</dt> <dd><p>This function compares strings lexicographically, except it treats sequences of numerical characters as if they comprised a base-ten number, and then compares the numbers. So ‘<samp>foo2.png</samp>’ is “smaller” than ‘<samp>foo12.png</samp>’ according to this predicate, even if ‘<samp>12</samp>’ is lexicographically “smaller” than ‘<samp>2</samp>’. </p></dd> +</dl> <dl> <dt id="string-prefix-p">Function: <strong>string-prefix-p</strong> <em>string1 string2 &optional ignore-case</em> +</dt> <dd><p>This function returns non-<code>nil</code> if <var>string1</var> is a prefix of <var>string2</var>; i.e., if <var>string2</var> starts with <var>string1</var>. If the optional argument <var>ignore-case</var> is non-<code>nil</code>, the comparison ignores case differences. </p></dd> +</dl> <dl> <dt id="string-suffix-p">Function: <strong>string-suffix-p</strong> <em>suffix string &optional ignore-case</em> +</dt> <dd><p>This function returns non-<code>nil</code> if <var>suffix</var> is a suffix of <var>string</var>; i.e., if <var>string</var> ends with <var>suffix</var>. If the optional argument <var>ignore-case</var> is non-<code>nil</code>, the comparison ignores case differences. </p></dd> +</dl> <dl> <dt id="string-search">Function: <strong>string-search</strong> <em>needle haystack &optional start-pos</em> +</dt> <dd><p>Return the position of the first instance of <var>needle</var> in <var>haystack</var>, both of which are strings. If <var>start-pos</var> is non-<code>nil</code>, start searching from that position in <var>needle</var>. Return <code>nil</code> if no match was found. This function only considers the characters in the strings when doing the comparison; text properties are ignored. Matching is always case-sensitive. </p></dd> +</dl> <dl> <dt id="compare-strings">Function: <strong>compare-strings</strong> <em>string1 start1 end1 string2 start2 end2 &optional ignore-case</em> +</dt> <dd> +<p>This function compares a specified part of <var>string1</var> with a specified part of <var>string2</var>. The specified part of <var>string1</var> runs from index <var>start1</var> (inclusive) up to index <var>end1</var> (exclusive); <code>nil</code> for <var>start1</var> means the start of the string, while <code>nil</code> for <var>end1</var> means the length of the string. Likewise, the specified part of <var>string2</var> runs from index <var>start2</var> up to index <var>end2</var>. </p> <p>The strings are compared by the numeric values of their characters. For instance, <var>str1</var> is considered less than <var>str2</var> if its first differing character has a smaller numeric value. If <var>ignore-case</var> is non-<code>nil</code>, characters are converted to upper-case, using the current buffer’s case-table (see <a href="case-tables">Case Tables</a>), before comparing them. Unibyte strings are converted to multibyte for comparison (see <a href="text-representations">Text Representations</a>), so that a unibyte string and its conversion to multibyte are always regarded as equal. </p> <p>If the specified portions of the two strings match, the value is <code>t</code>. Otherwise, the value is an integer which indicates how many leading characters agree, and which string is less. Its absolute value is one plus the number of characters that agree at the beginning of the two strings. The sign is negative if <var>string1</var> (or its specified portion) is less. </p> +</dd> +</dl> <dl> <dt id="string-distance">Function: <strong>string-distance</strong> <em>string1 string2 &optional bytecompare</em> +</dt> <dd> +<p>This function returns the <em>Levenshtein distance</em> between the source string <var>string1</var> and the target string <var>string2</var>. The Levenshtein distance is the number of single-character changes—deletions, insertions, or replacements—required to transform the source string into the target string; it is one possible definition of the <em>edit distance</em> between strings. </p> <p>Letter-case of the strings is significant for the computed distance, but their text properties are ignored. If the optional argument <var>bytecompare</var> is non-<code>nil</code>, the function calculates the distance in terms of bytes instead of characters. The byte-wise comparison uses the internal Emacs representation of characters, so it will produce inaccurate results for multibyte strings that include raw bytes (see <a href="text-representations">Text Representations</a>); make the strings unibyte by encoding them (see <a href="explicit-encoding">Explicit Encoding</a>) if you need accurate results with raw bytes. </p> +</dd> +</dl> <dl> <dt id="assoc-string">Function: <strong>assoc-string</strong> <em>key alist &optional case-fold</em> +</dt> <dd><p>This function works like <code>assoc</code>, except that <var>key</var> must be a string or symbol, and comparison is done using <code>compare-strings</code>. Symbols are converted to strings before testing. If <var>case-fold</var> is non-<code>nil</code>, <var>key</var> and the elements of <var>alist</var> are converted to upper-case before comparison. Unlike <code>assoc</code>, this function can also match elements of the alist that are strings or symbols rather than conses. In particular, <var>alist</var> can be a list of strings or symbols rather than an actual alist. See <a href="association-lists">Association Lists</a>. </p></dd> +</dl> <p>See also the function <code>compare-buffer-substrings</code> in <a href="comparing-text">Comparing Text</a>, for a way to compare text in buffers. The function <code>string-match</code>, which matches a regular expression against a string, can be used for a kind of string comparison; see <a href="regexp-search">Regexp Search</a>. </p><div class="_attribution"> + <p class="_attribution-p"> + Copyright © 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br> + <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Comparison.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Text-Comparison.html</a> + </p> +</div> |
