summaryrefslogtreecommitdiff
path: root/devdocs/elisp/sorting.html
blob: 0cfd41ed3d45fa2e5babcbeb6eed5b05356b8e4a (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
 <h3 class="section">Sorting Text</h3>  <p>The sorting functions described in this section all rearrange text in a buffer. This is in contrast to the function <code>sort</code>, which rearranges the order of the elements of a list (see <a href="rearrangement">Rearrangement</a>). The values returned by these functions are not meaningful. </p> <dl> <dt id="sort-subr">Function: <strong>sort-subr</strong> <em>reverse nextrecfun endrecfun &amp;optional startkeyfun endkeyfun predicate</em>
</dt> <dd>
<p>This function is the general text-sorting routine that subdivides a buffer into records and then sorts them. Most of the commands in this section use this function. </p> <p>To understand how <code>sort-subr</code> works, consider the whole accessible portion of the buffer as being divided into disjoint pieces called <em>sort records</em>. The records may or may not be contiguous, but they must not overlap. A portion of each sort record (perhaps all of it) is designated as the sort key. Sorting rearranges the records in order by their sort keys. </p> <p>Usually, the records are rearranged in order of ascending sort key. If the first argument to the <code>sort-subr</code> function, <var>reverse</var>, is non-<code>nil</code>, the sort records are rearranged in order of descending sort key. </p> <p>The next four arguments to <code>sort-subr</code> are functions that are called to move point across a sort record. They are called many times from within <code>sort-subr</code>. </p> <ol> <li> <var>nextrecfun</var> is called with point at the end of a record. This function moves point to the start of the next record. The first record is assumed to start at the position of point when <code>sort-subr</code> is called. Therefore, you should usually move point to the beginning of the buffer before calling <code>sort-subr</code>. <p>This function can indicate there are no more sort records by leaving point at the end of the buffer. </p> </li>
<li> <var>endrecfun</var> is called with point within a record. It moves point to the end of the record. </li>
<li> <var>startkeyfun</var> is called to move point from the start of a record to the start of the sort key. This argument is optional; if it is omitted, the whole record is the sort key. If supplied, the function should either return a non-<code>nil</code> value to be used as the sort key, or return <code>nil</code> to indicate that the sort key is in the buffer starting at point. In the latter case, <var>endkeyfun</var> is called to find the end of the sort key. </li>
<li> <var>endkeyfun</var> is called to move point from the start of the sort key to the end of the sort key. This argument is optional. If <var>startkeyfun</var> returns <code>nil</code> and this argument is omitted (or <code>nil</code>), then the sort key extends to the end of the record. There is no need for <var>endkeyfun</var> if <var>startkeyfun</var> returns a non-<code>nil</code> value. </li>
</ol> <p>The argument <var>predicate</var> is the function to use to compare keys. It is called with two arguments, the keys to compare, and should return non-<code>nil</code> if the first key should come before the second in the sorting order. What exactly are the key arguments depends on what <var>startkeyfun</var> and <var>endkeyfun</var> return. If <var>predicate</var> is omitted or <code>nil</code>, it defaults to <code>&lt;</code> if the keys are numbers, to <code>compare-buffer-substrings</code> if the keys are cons cells (whose <code>car</code> and <code>cdr</code> are start and end buffer positions of the key), and to <code>string&lt;</code> otherwise (with keys assumed to be strings). </p> <p>As an example of <code>sort-subr</code>, here is the complete function definition for <code>sort-lines</code>: </p> <div class="example"> <pre class="example">;; <span class="roman">Note that the first two lines of doc string</span>
;; <span class="roman">are effectively one line when viewed by a user.</span>
(defun sort-lines (reverse beg end)
  "Sort lines in region alphabetically;\
 argument means descending order.
Called from a program, there are three arguments:
</pre>
<pre class="example">REVERSE (non-nil means reverse order),\
 BEG and END (region to sort).
The variable `sort-fold-case' determines\
 whether alphabetic case affects
the sort order."
</pre>
<pre class="example">  (interactive "P\nr")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (let ((inhibit-field-text-motion t))
        (sort-subr reverse 'forward-line 'end-of-line)))))
</pre>
</div> <p>Here <code>forward-line</code> moves point to the start of the next record, and <code>end-of-line</code> moves point to the end of record. We do not pass the arguments <var>startkeyfun</var> and <var>endkeyfun</var>, because the entire record is used as the sort key. </p> <p>The <code>sort-paragraphs</code> function is very much the same, except that its <code>sort-subr</code> call looks like this: </p> <div class="example"> <pre class="example">(sort-subr reverse
           (lambda ()
             (while (and (not (eobp))
                         (looking-at paragraph-separate))
               (forward-line 1)))
           'forward-paragraph)
</pre>
</div> <p>Markers pointing into any sort records are left with no useful position after <code>sort-subr</code> returns. </p>
</dd>
</dl> <dl> <dt id="sort-fold-case">User Option: <strong>sort-fold-case</strong>
</dt> <dd><p>If this variable is non-<code>nil</code>, <code>sort-subr</code> and the other buffer sorting functions ignore case when comparing strings. </p></dd>
</dl> <dl> <dt id="sort-regexp-fields">Command: <strong>sort-regexp-fields</strong> <em>reverse record-regexp key-regexp start end</em>
</dt> <dd>
<p>This command sorts the region between <var>start</var> and <var>end</var> alphabetically as specified by <var>record-regexp</var> and <var>key-regexp</var>. If <var>reverse</var> is a negative integer, then sorting is in reverse order. </p> <p>Alphabetical sorting means that two sort keys are compared by comparing the first characters of each, the second characters of each, and so on. If a mismatch is found, it means that the sort keys are unequal; the sort key whose character is less at the point of first mismatch is the lesser sort key. The individual characters are compared according to their numerical character codes in the Emacs character set. </p> <p>The value of the <var>record-regexp</var> argument specifies how to divide the buffer into sort records. At the end of each record, a search is done for this regular expression, and the text that matches it is taken as the next record. For example, the regular expression ‘<samp>^.+$</samp>’, which matches lines with at least one character besides a newline, would make each such line into a sort record. See <a href="regular-expressions">Regular Expressions</a>, for a description of the syntax and meaning of regular expressions. </p> <p>The value of the <var>key-regexp</var> argument specifies what part of each record is the sort key. The <var>key-regexp</var> could match the whole record, or only a part. In the latter case, the rest of the record has no effect on the sorted order of records, but it is carried along when the record moves to its new position. </p> <p>The <var>key-regexp</var> argument can refer to the text matched by a subexpression of <var>record-regexp</var>, or it can be a regular expression on its own. </p> <p>If <var>key-regexp</var> is: </p> <dl compact> <dt>‘<samp>\<var>digit</var></samp>’</dt> <dd>
<p>then the text matched by the <var>digit</var>th ‘<samp>\(...\)</samp>’ parenthesis grouping in <var>record-regexp</var> is the sort key. </p> </dd> <dt>‘<samp>\&amp;</samp>’</dt> <dd>
<p>then the whole record is the sort key. </p> </dd> <dt>a regular expression</dt> <dd><p>then <code>sort-regexp-fields</code> searches for a match for the regular expression within the record. If such a match is found, it is the sort key. If there is no match for <var>key-regexp</var> within a record then that record is ignored, which means its position in the buffer is not changed. (The other records may move around it.) </p></dd> </dl> <p>For example, if you plan to sort all the lines in the region by the first word on each line starting with the letter ‘<samp>f</samp>’, you should set <var>record-regexp</var> to ‘<samp>^.*$</samp>’ and set <var>key-regexp</var> to ‘<samp>\&lt;f\w*\&gt;</samp>’. The resulting expression looks like this: </p> <div class="example"> <pre class="example">(sort-regexp-fields nil "^.*$" "\\&lt;f\\w*\\&gt;"
                    (region-beginning)
                    (region-end))
</pre>
</div> <p>If you call <code>sort-regexp-fields</code> interactively, it prompts for <var>record-regexp</var> and <var>key-regexp</var> in the minibuffer. </p>
</dd>
</dl> <dl> <dt id="sort-lines">Command: <strong>sort-lines</strong> <em>reverse start end</em>
</dt> <dd><p>This command alphabetically sorts lines in the region between <var>start</var> and <var>end</var>. If <var>reverse</var> is non-<code>nil</code>, the sort is in reverse order. </p></dd>
</dl> <dl> <dt id="sort-paragraphs">Command: <strong>sort-paragraphs</strong> <em>reverse start end</em>
</dt> <dd><p>This command alphabetically sorts paragraphs in the region between <var>start</var> and <var>end</var>. If <var>reverse</var> is non-<code>nil</code>, the sort is in reverse order. </p></dd>
</dl> <dl> <dt id="sort-pages">Command: <strong>sort-pages</strong> <em>reverse start end</em>
</dt> <dd><p>This command alphabetically sorts pages in the region between <var>start</var> and <var>end</var>. If <var>reverse</var> is non-<code>nil</code>, the sort is in reverse order. </p></dd>
</dl> <dl> <dt id="sort-fields">Command: <strong>sort-fields</strong> <em>field start end</em>
</dt> <dd><p>This command sorts lines in the region between <var>start</var> and <var>end</var>, comparing them alphabetically by the <var>field</var>th field of each line. Fields are separated by whitespace and numbered starting from 1. If <var>field</var> is negative, sorting is by the -<var>field</var>th field from the end of the line. This command is useful for sorting tables. </p></dd>
</dl> <dl> <dt id="sort-numeric-fields">Command: <strong>sort-numeric-fields</strong> <em>field start end</em>
</dt> <dd>
<p>This command sorts lines in the region between <var>start</var> and <var>end</var>, comparing them numerically by the <var>field</var>th field of each line. Fields are separated by whitespace and numbered starting from 1. The specified field must contain a number in each line of the region. Numbers starting with 0 are treated as octal, and numbers starting with ‘<samp>0x</samp>’ are treated as hexadecimal. </p> <p>If <var>field</var> is negative, sorting is by the -<var>field</var>th field from the end of the line. This command is useful for sorting tables. </p>
</dd>
</dl> <dl> <dt id="sort-numeric-base">User Option: <strong>sort-numeric-base</strong>
</dt> <dd><p>This variable specifies the default radix for <code>sort-numeric-fields</code> to parse numbers. </p></dd>
</dl> <dl> <dt id="sort-columns">Command: <strong>sort-columns</strong> <em>reverse &amp;optional beg end</em>
</dt> <dd>
<p>This command sorts the lines in the region between <var>beg</var> and <var>end</var>, comparing them alphabetically by a certain range of columns. The column positions of <var>beg</var> and <var>end</var> bound the range of columns to sort on. </p> <p>If <var>reverse</var> is non-<code>nil</code>, the sort is in reverse order. </p> <p>One unusual thing about this command is that the entire line containing position <var>beg</var>, and the entire line containing position <var>end</var>, are included in the region sorted. </p> <p>Note that <code>sort-columns</code> rejects text that contains tabs, because tabs could be split across the specified columns. Use <kbd>M-x untabify</kbd> to convert tabs to spaces before sorting. </p> <p>When possible, this command actually works by calling the <code>sort</code> utility program. </p>
</dd>
</dl><div class="_attribution">
  <p class="_attribution-p">
    Copyright &copy; 1990-1996, 1998-2022 Free Software Foundation, Inc. <br>Licensed under the GNU GPL license.<br>
    <a href="https://www.gnu.org/software/emacs/manual/html_node/elisp/Sorting.html" class="_attribution-link">https://www.gnu.org/software/emacs/manual/html_node/elisp/Sorting.html</a>
  </p>
</div>