sort(1)sort(1)NAMEsort - Sorts or merges files
SYNOPSISsort [-m] [-o output_file] [-Abdfinru] [-k keydef]... [-t character]
[-T directory] [-y] [kilobytes] [-z record_size]... file...
sort-c [-u] [-Abdfinru] [-k keydef]... [-t character] [-T directory]
[-y] [kilobytes] [-z record_size]... file...
The following older syntax is now maintained for backward compatibil‐
ity, but may be withdrawn in future issues: sort [-Abcdfimnru] [-o out‐
put_file] [-t character] [-T directory] [-y] [kilobytes]
[-z record_size] [+fskip] [.cskip] [-fskip] [.cskip] [-bdfinr]...
file...
STANDARDS
Interfaces documented on this reference page conform to industry stan‐
dards as follows:
sort: XCU5.0
Refer to the standards(5) reference page for more information about
industry standards and associated tags.
OPTIONS
The -d, -f, -i, -n, and -r options override the default ordering rules.
When ordering options appear independent of any key field specifica‐
tions, the requested field ordering rules are applied globally to all
sort keys. When attached to a specific key (see -k), the specified
ordering options override all global ordering options for that key. In
the obsolescent forms, if one or more of these options follows a +fskip
option, it affects only the key field specified by that preceding
option. [Tru64 UNIX] Sorts on a byte-by-byte basis using each charac‐
ter's encoded value. On some systems, extended characters will be con‐
sidered negative values, and so sort before ASCII characters. If you
are sorting ASCII characters in a non-C/POSIX locale, this option per‐
forms much faster. Ignores leading spaces and tabs when determining
the starting and ending positions of a restricted sort key. If the -b
option is specified before the first -k option, the -b option is
applied to all -k options on the command line; otherwise, the -b option
can be independently attached to each -k field_start or field_end argu‐
ment. Checks that the input is sorted according to the ordering rules
specified in the options and the collating sequence of the current
locale. No output is produced; only the exit code is affected. Speci‐
fies that only spaces and alphanumeric characters (according to the
current setting of LC_TYPE) are significant in comparisons. Treats all
lowercase characters as their uppercase equivalents (according to the
current setting of LC_TYPE) for the purposes of comparison. Sorts only
by printable characters (according to the current setting of LC_TYPE).
Specifies one or more (up to 50) restricted sort key field definitions.
This option replaces the obsolescent +fskip.cskip and -fskip.cskip
options. A field comprises a maximal sequence of non-separating charac‐
ters and, in the absence of the -t option, any preceding field separa‐
tor.
The format of a key field definition is as follows:
field_start[type][,field_end[type]]
The field_start and field_end arguments define a key field that
is restricted to a portion of the line, and type is a modifier
specified by b, d, f, i, n, r, or t. The b modifier behaves
like the -b option, but applies only to the field_start or
field_end argument to which it is attached. The t modifier
indicates that the key field is processed as CPU time. The other
modifiers behave like their corresponding options, but apply
only to the key field to which they are attached; these modi‐
fiers have this effect if specified with field_start, field_end
or both.
Modifiers attached to a field_start or field_end argument over‐
ride any specifications made by the options. A missing
field_end argument means the last character of the line. When
multiple sort keys are specified, it is advisable to specify a
field_end argument to avoid possible confusion.
The field_start portion of the keydef argument takes the follow‐
ing form: field_number[.first_character]
Fields and characters within fields are numbered starting with
1. The field_number and first_character pieces, interpreted as
positive decimal integers, specify the character to be used as
part of a sort key. If first_character is not specified, the
default is the first character of the field.
The field_end portion of the keydef argument takes the following
form: field_number[.last_character]
The field_number syntax is the same as that described for
field_start. The last_character argument, interpreted as a non‐
negative decimal integer, specifies the last character to be
used as part of the sort key. If last_character evaluates to 0
(zero) or is not specified, the default is the last character of
the field specified by field_number.
If -b is in effect, characters within a field are counted from
the first nonspace character in the field. (This applies sepa‐
rately to first_character and last_character.)
If -k is not specified, the default sort key is the entire line.
When there are multiple key fields, later keys are compared only
after all earlier keys compare as equal. Except when the -u
option is specified, lines that otherwise compare as equal are
ordered as though none of the options -d, -f, -i, -n, or -k were
present (but with -r still in effect, if it was specified) and
with all bytes in the lines significant to the comparison.
The algorithm for the -k option can be summarized as follows:
/*
* -ka.b,c.d = if d==0 then +(a-1).(b-1) -c.d
* else +(a-1).(b-1) -(c-1).d
*/ Merges only (assumes sorted input). Sorts any initial
numeric strings (including regular expressions consisting of
optional spaces, optional dashes, and zero (0) or more digits
with optional radix character and thousands separator, as
defined by the current locale) by arithmetic value. An empty
digit string is treated as zero; leading zeros and signs on
zeros do not affect ordering. Only one period (.) can be used
in numeric strings. All subsequent periods (.) and any charac‐
ter to the right of the period (.) will be ignored. Directs
output to output_file instead of standard output. The out‐
put_file can be the same as one of the input files. Reverses
the order of the specified sort. Sets the field separator char‐
acter to character. The character argument is not considered to
be part of a field (although it can be included in a sort key).
Each occurrence of character is significant (for example, two
consecutive occurrences of character delimit an empty field).
To specify the tab character as the field separator, you must
enclose it in ' ' (single quotes).
The default field separator is one or more spaces. [Tru64
UNIX] Places all the temporary files that are created in direc‐
tory. Suppresses all but one in each set of equal lines (for
example, lines whose sort keys match exactly). Ignored charac‐
ters such as leading tabs and spaces, and characters outside of
sort keys are not considered in this type of comparison.
If used with the -c option, -u checks that there are no lines
with duplicate keys, in addition to checking that the input file
is sorted. [Tru64 UNIX] Starts the sort command using kilo‐
bytes of main storage and adds storage as needed. (If kilobytes
is less than the minimum storage size or greater than the maxi‐
mum, the minimum or maximum is used instead.) If the -y option
is omitted, the sort command starts with the default storage
size; -y 0 starts with minimum storage, and -y (with no value)
starts with the maximum storage. The amount of storage used by
the sort command has a significant impact on performance. Sort‐
ing a small file in a large amount of storage is wasteful. Pre‐
vents abnormal termination if lines being sorted are longer than
the default buffer size can handle. When the -c or -m options
are specified, the sorting phase is omitted and a system default
size buffer is used. If sorted lines are longer than this size,
sort terminates abnormally. The -z option specifies that the
longest line be recorded in the sort phase so that adequate buf‐
fers can be allocated in the merge phase. The record_size argu‐
ment must be a value in bytes equal to or greater than the num‐
ber of bytes in the longest line to be merged. Specifies the
start position of a key field. See the -k option for a descrip‐
tion of the current way to perform this operation. (Obsoles‐
cent)
The fskip variable specifies the number of fields to skip from
the beginning of the input line, and the cskip variable speci‐
fies the number of additional characters to skip to the right
beyond that point. For both the starting point (+fskip.cskip)
and the ending point (-fskip.cskip) of a sort key, fskip is mea‐
sured from the beginning of the input line, and cskip is mea‐
sured from the last field skipped. If you omit assumed. If you
omit fskip, 0 (zero) is assumed. If you omit the ending field
specifier (-fskip.cskip), the end of the line is the end of the
sort key.
You can supply more than one sort key by repeating +fskip.cskip
and -fskip.cskip. In cases where you specify more than one sort
key, keys specified further to the right on the command line are
compared only after all earlier keys are sorted. For example,
if the first key is to be sorted in numerical order and the sec‐
ond according to the collating sequence, all strings that start
with the number 1 are sorted according to the collating order
before the strings that start with the number 2. Lines that are
identical in all keys are sorted with all characters signifi‐
cant. You can also specify different options for different sort
keys in multiple sort keys. Specifies the end position of a key
field. See the -k option for a description of the current way
to perform this operation. (Obsolescent)
DESCRIPTION
The sort command sorts lines in its input files and writes the result
to standard output.
The sort command performs one of the following functions: Sorts lines
of all the named files together and writes the result to the specified
output. Merges lines of all the named (presorted) files together and
writes the result to the specified output. Checks that a single input
file is correctly presorted.
Comparisons are based on one or more sort keys extracted from each line
of input (or the entire line if no sort keys are specified), and are
performed using the collating sequence of the current locale.
The sort command treats all of its input files as one file when it per‐
forms the sort. A - (dash) in place of a file name specifies standard
input. If you do not specify a file name, it sorts standard input.
The sort command can handle a variety of collation rules typically used
in Western European languages, including primary/secondary sorting,
one-to-two character mapping, N-to-one character mapping, and ignore-
character mapping. To summarize briefly:
Primary/Secondary Sorting
In this system, a group of characters all sort to the same primary
location. If there is a tie, a secondary sort is applied. For exam‐
ple, in French, the plain and accented a's all sort to the same primary
location. If two strings collate to the same primary location, the
secondary sort goes into effect. These words are in correct French
order:
abord pre aprs pret azur
One-to-Two Character Mappings
This system requires that certain single characters be treated as if
they were two characters. For example, in German, the (scharfes-S) is
collated as if it were ss.
N-to-One Character Mappings
Some languages treat a string of characters as if it were one single
collating element. For example, in Spanish, the ch and ll sequences
are treated as their own elements within the alphabet. (ch comes
between c and d in the alphabet, and ll comes between l and m.)
Ignore-Character Mappings
In some cases, certain characters may be ignored in collation. For
example, if - were defined as an ignore-character, the strings re-
locate and relocate would sort to the same place. The results that you
get from sort depend on the collating sequence as defined by the cur‐
rent setting of the LC_COLLATE environment variable. The configuration
files for collation and character classification information are
/usr/lib/nls/loc/src/locale.src. A field is one or more characters
bounded by the beginning of a line and the current field separator, or
one or more characters bounded by a field separator on either side.
The space character is the default field separator. Lines longer than
1024 bytes are truncated by sort. The maximum number of fields on a
line is 50.
EXIT STATUS
The sort command returns the following exit values: All input files
were output successfully, or -c was specified and the input file was
correctly sorted. Under the -c option, the file was not ordered as
specified, or if the -c and -u options were both specified, two input
lines were found with equal keys. An error occurred.
EXAMPLES
The following examples apply to the C locale, unless it is specifically
stated otherwise. To perform a simple sort, enter: sort fruits
This displays the contents of fruits sorted in ascending lexico‐
graphic order. This means that the characters in each column
are compared one by one, including spaces, digits, and special
characters.
For instance, if fruits contains the text:
banana orange Persimmon apple %%banana apple ORANGE
Then sort fruits displays: %%banana ORANGE Persimmon apple apple
banana orange
This order follows from the fact that in the ASCII collating
sequence, symbols (such as %) precede uppercase letters, and all
uppercase letters precede the lowercase letters. If you are
using a different collating order, your results may be differ‐
ent. To group lines that contain uppercase and special charac‐
ters with similar lowercase lines, and remove duplicate lines,
enter: sort-d -f -u fruits
The -u option tells sort to remove duplicate lines, making each
line of the file unique. This displays: apple %%banana orange
Persimmon
Not only was the duplicate apple removed, but banana and ORANGE
were removed as well. The -d option told sort to ignore symbols,
so %%banana and banana were considered to be duplicate lines and
banana was removed. The -f option told sort not to differenti‐
ate between uppercase and lowercase, so ORANGE and orange were
considered to be duplicate lines and ORANGE was removed.
When the -u option is used with input that contains nonidentical
lines that are considered by sort (due to other options) to be
duplicates, there is no way to predict which lines sort will
keep and which it will remove. To sort as in Example 2, but
remove duplicates unless capitalized or punctuated differently,
enter: sort-u -k 1df -k 1 fruits
Options appearing between sort key specifiers apply only to the
specifier preceding them. There are two sorts specified in this
command line. The -k 1df argument specifies the first sort, of
the same type done with -d -f in Example 3. Then -k 1 performs
another comparison to distinguish lines that are not actually
identical. This prevents -u, which applies to both sorts
because it precedes the first sort key specifier, from removing
lines that are not exactly identical to other lines.
Given the fruits file shown in Example 1, the added -k 1 distin‐
guishes %%banana from banana and ORANGE from orange. However,
the two instances of apple are exactly identical, so one of them
is deleted. apple %%banana banana ORANGE orange Persimmon To
specify a new field separator, enter: sort-t : -k 2 vegetables
This sorts vegetables, comparing the text that follows the first
colon on each line. The -t : option tells sort that colons sep‐
arate fields. The -k 2 argument tells sort to ignore the first
field and to compare from the start of the second field to the
end of the line. If vegetables contains:
yams:104 turnips:8 potatoes:15 carrots:104 green beans:32
radishes:5 lettuce:15
then sort-t : -k 2 vegetables displays: carrots:104 yams:104
lettuce:15 potatoes:15 green beans:32 radishes:5 turnips:8
The numbers are not in ascending order. This is because a lexi‐
cographic sort compares each character from left to right. In
other words, 3 comes before 5 so 32 comes before 5. To sort on
more than one field, enter: sort-t : -k 2n -k 1r vegetables
This performs a numeric sort on the second field (-k 2n) and
then, within that ordering, sorts the first field in reverse
collating order (-k 1r). The output looks like this: radishes:5
turnips:8 potatoes:15 lettuce:15 green beans:32 yams:104 car‐
rots:104
The lines are sorted in numeric order; when two lines have the
same number, they appear in reverse collating order. To replace
the original file with the sorted text, enter: sort-o vegeta‐
bles vegetables
The -o vegetables option stores the sorted output into the file
vegetables. To collate using Spanish rules, set the LC_COLLATE
(or LANG) environment variable to a Spanish locale, and then use
sort in the regular way, enter: sort sp.words
If an input file named sp.words contains the following Spanish
words:
dama loro chapa canto mover chocolate curioso llanura
The sorted file looks like this: canto curioso chapa chocolate
dama loro llanura mover
If you sort the file in the default C locale, the output looks
like this: canto chapa chocolate curioso dama llanura loro mover
ENVIRONMENT VARIABLES
The following environment variables affect the execution of sort: Pro‐
vides a default value for the internationalization variables that are
unset or null. If LANG is unset or null, the corresponding value from
the default locale is used. If any of the internationalization vari‐
ables contain an invalid setting, the utility behaves as if none of the
variables had been defined. If set to a non-empty string value, over‐
rides the values of all the other internationalization variables.
Determines the locale for the interpretation of sequences of bytes of
text data as characters (for example, single-byte as opposed to multi‐
byte characters in arguments) and the behavior of character classifica‐
tion for the -b, -d, -f, -i, and -n options. Determines the locale for
the format and contents of diagnostic messages written to standard
error. Determines the location of message catalogues for the process‐
ing of LC_MESSAGES.
FILES
Configuration files
SEE ALSO
Commands: comm(1), join(1), uniq(1)
Functions: setlocale(3), tolower(3)
Files: locale(4)
Standards: standards(5)sort(1)