Bash script to find the frequency of every letter in a file

前端 未结 5 2127
眼角桃花
眼角桃花 2021-02-06 23:26

I am trying to find out the frequency of appearance of every letter in the english alphabet in an input file. How can I do this in a bash script?

相关标签:
5条回答
  • 2021-02-07 00:07

    Similar to mouviciel's answer above, but more generic for Bourne and Korn shells used on BSD systems, when you don't have GNU sed, which supports \n in a replacement, you can backslash escape a newline:

    sed -e's/./&\
    /g' file | sort | uniq -c | sort -nr
    

    or to avoid the visual split on the screen, insert a literal newline by type CTRL+V CTRL+J

    sed -e's/./&\^J/g' file | sort | uniq -c | sort -nr
    
    0 讨论(0)
  • 2021-02-07 00:25

    My solution using grep, sort and uniq.

    grep -o . file | sort | uniq -c
    

    Ignore case:

    grep -o . file | sort -f | uniq -ic
    
    0 讨论(0)
  • 2021-02-07 00:26

    A solution with sed, sort and uniq:

    sed 's/\(.\)/\1\n/g' file | sort | uniq -c
    

    This counts all characters, not only letters. You can filter out with:

    sed 's/\(.\)/\1\n/g' file | grep '[A-Za-z]' | sort | uniq -c
    

    If you want to consider uppercase and lowercase as same, just add a translation:

    sed 's/\(.\)/\1\n/g' file | tr '[:upper:]' '[:lower:]' | grep '[a-z]' | sort | uniq -c
    
    0 讨论(0)
  • 2021-02-07 00:27

    Here is a suggestion:

    while read -n 1 c
    do
        echo "$c"
    done < "$INPUT_FILE" | grep '[[:alpha:]]' | sort | uniq -c | sort -nr
    
    0 讨论(0)
  • 2021-02-07 00:28

    Just one awk command

    awk -vFS="" '{for(i=1;i<=NF;i++)w[$i]++}END{for(i in w) print i,w[i]}' file
    

    if you want case insensitive, add tolower()

    awk -vFS="" '{for(i=1;i<=NF;i++)w[tolower($i)]++}END{for(i in w) print i,w[i]}' file
    

    and if you want only characters,

    awk -vFS="" '{for(i=1;i<=NF;i++){ if($i~/[a-zA-Z]/) { w[tolower($i)]++} } }END{for(i in w) print i,w[i]}' file
    

    and if you want only digits, change /[a-zA-Z]/ to /[0-9]/

    if you do not want to show unicode, do export LC_ALL=C

    0 讨论(0)
提交回复
热议问题