Why does “uniq” count identical words as different?

前端 未结 4 1233
南方客
南方客 2021-01-05 11:02

I want to calculate the frequency of the words from a file, where the words are one by line. The file is really big, so this might be the problem (it counts 300k lines in th

4条回答
  •  小蘑菇
    小蘑菇 (楼主)
    2021-01-05 11:53

    The size of the file has nothing to do with what you're seeing. From the man page of uniq(1):

    Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.`

    So running uniq on

    a
    b
    a
    

    will return:

    a
    b
    a
    

提交回复
热议问题