How can I count most occuring sequence of 3 letters within a word with a bash script

后端 未结 3 1242
挽巷
挽巷 2021-01-14 07:31

I have a sample file like

XYZAcc
ABCAccounting
Accounting firm
Accounting Aco
Accounting Acompany
Acoustical consultant

Here I need to grep

3条回答
  •  不知归路
    2021-01-14 07:56

    Here's how to get started with what I THINK you're trying to do:

    $ cat tst.awk
    BEGIN { stringLgth = 3 }
    {
        for (fldNr=1; fldNr<=NF; fldNr++) {
            field = $fldNr
            fieldLgth = length(field)
            if ( fieldLgth >= stringLgth ) {
                maxBegPos = fieldLgth - (stringLgth - 1)
                for (begPos=1; begPos<=maxBegPos; begPos++) {
                    string = tolower(substr(field,begPos,stringLgth))
                    cnt[string]++
                }
            }
        }
    }
    END {
        for (string in cnt) {
            print string, cnt[string]
        }
    }
    

    .

    $ awk -f tst.awk file | sort -k2,2nr
    acc 5
    cou 5
    cco 4
    ing 4
    nti 4
    oun 4
    tin 4
    unt 4
    aco 3
    abc 1
    ant 1
    any 1
    bca 1
    cac 1
    cal 1
    com 1
    con 1
    fir 1
    ica 1
    irm 1
    lta 1
    mpa 1
    nsu 1
    omp 1
    ons 1
    ous 1
    pan 1
    sti 1
    sul 1
    tan 1
    tic 1
    ult 1
    ust 1
    xyz 1
    yza 1
    zac 1
    

提交回复
热议问题