what is the meaning of delimiter in cut and why in this command it is sorting twice?

妖精的绣舞 提交于 2019-12-11 14:49:11

问题


I am trying to find the reason of this command and as I know very basic I found that

last | cut -d" " -f 1 | sort | uniq -c | sort

last = Last searches back through the file /var/log/wtmp (or the file designated by the -f flag) and displays a list of all users logged in (and out) since that file was created.

cut is to show the desired column.

The option -d specifies what is the field delimiter that is used in the input file.

-f specifies which field you want to extract

1 is the out put I think which I am not sure

and the it is sorting and then it is

Uniq command is helpful to remove or detect duplicate entries in a file. This tutorial explains few most frequently used uniq command line options that you might find helpful.

If anyone can explain this command and also explain why there is two sorts I will appreciate it.


回答1:


You are right on your explanation of cut: cut -d" " -f1 (no need of space after f) gets the first field of a stream based on delimiter " " (space).

Then why sort | uniq -c | sort?

From man uniq:

Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.

That's why you need to sort the lines before piping to uniq. Finally, as uniq output is not sorted, you need to sort again to see the most repeated items first.


See an example of sort and uniq -c for a given file with repeated items:

$ seq 5 >>a
$ seq 5 >>a
$ cat a
1
2
3
4
5
1
2
3
4
5

$ sort a | uniq -c | sort <--- no repeated matches
      2 1
      2 2
      2 3
      2 4
      2 5

$ uniq -c a | sort <---- repeated matches
      1 1
      1 1
      1 2
      1 2
      1 3
      1 3
      1 4
      1 4
      1 5
      1 5

Note you can do the sort | uniq -c all together with this awk:

last | awk '{a[$1]++} END{for (i in a) print i, a[i]}'

This will store in the a[] array the values of the first column and increase the counter whenever it finds more. In the END{} blocks it prints the results, unsorted, so you could pipe again to sort.




回答2:


uniq -c is being used to create a frequency histogram. The reason for the second sort is that you are then sorting your histogram by frequency order.

The reason for the first sort is that uniq is only comparing each line to its previous when deciding whether the line is unique or not.



来源:https://stackoverflow.com/questions/22556470/what-is-the-meaning-of-delimiter-in-cut-and-why-in-this-command-it-is-sorting-tw

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!