I have tab delimited files with several columns. I want to count the frequency of occurrence of the different values in a column for all the files in a folder and sort them in d
The GNU site suggests this nice awk script, which prints both the words and their frequency.
Possible changes:
sort -nr
(and reverse word
and freq[word]
) to see the result in descending order.freq[3]++
- replace 3 with the column number.Here goes:
# wordfreq.awk --- print list of word frequencies
{
$0 = tolower($0) # remove case distinctions
# remove punctuation
gsub(/[^[:alnum:]_[:blank:]]/, "", $0)
for (i = 1; i <= NF; i++)
freq[$i]++
}
END {
for (word in freq)
printf "%s\t%d\n", word, freq[word]
}