Getting the count of unique values in a column in bash

后端 未结 5 897
一整个雨季
一整个雨季 2021-01-30 08:14

I have tab delimited files with several columns. I want to count the frequency of occurrence of the different values in a column for all the files in a folder and sort them in d

5条回答
  •  庸人自扰
    2021-01-30 08:25

    The GNU site suggests this nice awk script, which prints both the words and their frequency.

    Possible changes:

    • You can pipe through sort -nr (and reverse word and freq[word]) to see the result in descending order.
    • If you want a specific column, you can omit the for loop and simply write freq[3]++ - replace 3 with the column number.

    Here goes:

     # wordfreq.awk --- print list of word frequencies
    
     {
         $0 = tolower($0)    # remove case distinctions
         # remove punctuation
         gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
         for (i = 1; i <= NF; i++)
             freq[$i]++
     }
    
     END {
         for (word in freq)
             printf "%s\t%d\n", word, freq[word]
     }
    

提交回复
热议问题