what is the meaning of delimiter in cut and why in this command it is sorting twice?

问题

I am trying to find the reason of this command and as I know very basic I found that

last | cut -d" " -f 1 | sort | uniq -c | sort

last = Last searches back through the file /var/log/wtmp (or the file designated by the -f flag) and displays a list of all users logged in (and out) since that file was created.

cut is to show the desired column.

The option -d specifies what is the field delimiter that is used in the input file.

-f specifies which field you want to extract

1 is the out put I think which I am not sure

and the it is sorting and then it is

Uniq command is helpful to remove or detect duplicate entries in a file. This tutorial explains few most frequently used uniq command line options that you might find helpful.

If anyone can explain this command and also explain why there is two sorts I will appreciate it.

回答1:

You are right on your explanation of cut: cut -d" " -f1 (no need of space after f) gets the first field of a stream based on delimiter " " (space).

Then why sort | uniq -c | sort?

From man uniq:

Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.

That's why you need to sort the lines before piping to uniq. Finally, as uniq output is not sorted, you need to sort again to see the most repeated items first.

See an example of sort and uniq -c for a given file with repeated items:

$ seq 5 >>a
$ seq 5 >>a
$ cat a
1
2
3
4
5
1
2
3
4
5

$ sort a | uniq -c | sort <--- no repeated matches
      2 1
      2 2
      2 3
      2 4
      2 5

$ uniq -c a | sort <---- repeated matches
      1 1
      1 1
      1 2
      1 2
      1 3
      1 3
      1 4
      1 4
      1 5
      1 5

Note you can do the sort | uniq -c all together with this awk:

last | awk '{a[$1]++} END{for (i in a) print i, a[i]}'

This will store in the a[] array the values of the first column and increase the counter whenever it finds more. In the END{} blocks it prints the results, unsorted, so you could pipe again to sort.