I want to calculate the frequency of the words from a file, where the words are one by line. The file is really big, so this might be the problem (it counts 300k lines in th
The size of the file has nothing to do with what you're seeing. From the man page of uniq(1):
Note: 'uniq' does not detect repeated lines unless they are adjacent. You may want to sort the input first, or use 'sort -u' without 'uniq'. Also, comparisons honor the rules specified by 'LC_COLLATE'.`
So running uniq
on
a
b
a
will return:
a
b
a