Advanced Shell Topics: sort, uniq and cut


sort

sort does exactly what the word says. It sorts! It sorts alphabetically.

[user@host ~]$ cat characters 
3
g
1
6
i
f
7
8
zebra
d
10
19
20
3
l
9
c
83
7
b
dog
a
e
h
j
cat
k
[user@host ~]$ sort characters 
1
10
19
20
3
3
6
7
7
8
83
9
a
b
c
cat
d
dog
e
f
g
h
i
j
k
l
zebra

It is important to note that it sorts based on the first character first, then second character and so on. This means that complete numbers will most likely end up out of order. 100 will come before 2 instead of after. You can correct this problem with numbers by using the -n option:

[user@host ~]$ sort -n characters 
a
b
c
cat
d
dog
e
f
g
h
i
j
k
l
zebra
1
3
3
6
7
7
8
9
10
19
20
83

The numbers are now sorted, but strangely enough they come after the letters instead of before them. This is because sort is now thinking of their complete numerical value instead of their ASCII character value.



uniq

uniq is a program that compares neighboring lines for uniqueness, if the lines are the same, it keeps only one of them:

[user@host ~]$ cat animals
cat
cat
cat
dog
dog
dog
dog
dog
dog
elephant
elephant
[user@host ~]$ uniq animals 
cat
dog
elephant

This does not mean that all instances of a word will be limited down to one. For instance if I mix up the animals a bit:

[user@host ~]$ cat animals cat dog cat cat dog dog elephant dog dog dog dog elephant elephant cat [user@host ~]$ uniq animals cat dog cat dog elephant dog elephant cat

Only the same lines that are on neighboring lines are compacted down to one line. You can use a combination of sort and uniq to get around this:

[user@host ~]$ sort animals | uniq
cat
dog
elephant


cut

cut is a program that trims lines at the end or at the beginning by a specified number of characters. Let's say you have a document with a lot of lines that go beyond the 80 character width of your terminal and get wrapped around. You can just truncate the lines like this:

[user@host ~]$ cat 80char 
0123456789|123456789|123456789|123456789|123456789|123456789|123456789|123456789|
This is a sample document that has more than 80 characters per line, not really allowing you to view it properly on a an 80 character wide terminal window.
So the solution is to use this 'cut' program that makes it easy to truncate a line in any way you wish.
You can also do this function using a program like sed, but cut makes it easier and also makes better sense when thinking of the Unix shell as a language.
[user@host ~]$ cat 80char | cut -c -80
0123456789|123456789|123456789|123456789|123456789|123456789|123456789|123456789
This is a sample document that has more than 80 characters per line, not really 
So the solution is to use this 'cut' program that makes it easy to truncate a li
You can also do this function using a program like sed, but cut makes it easier 

The -c option means cut characters. The -80 argument to the -c option means to cut from the begining up to the 80th character. You can also specify multiple cuts by seperating them by commas:

[user@host ~]$ cat 80char | cut -c 1-17,41-59,70-80
0123456789|123456|123456789|123456789|123456789
This is a sample than 80 characters not really 
So the solution igram that makes it uncate a li
You can also do togram like sed, but it easier 

cut also has other options that make it function more like awk, but I'll leave those for you to explore.


© 2000 Suso Banderas - suso@suso.org