发表新帖

发表新帖

How can I count most occuring sequence of 3 letters within a word with a bash script

后端未结

关注

 3  1239

挽巷 2021-01-14 07:31

I have a sample file like

XYZAcc
ABCAccounting
Accounting firm
Accounting Aco
Accounting Acompany
Acoustical consultant

Here I need to grep

3条回答

执笔经年 (楼主)

2021-01-14 08:06
This is an alternative method to the solution of Ed Morton. It is less looping, but needs a bit more memory. The idea is not to care about spaces or any non-alphabetic character. We filter them out in the end.
```
awk -v n=3 '{ for(i=length-n+1;i>0;--i) a[tolower(substr($0,i,n))]++ }
            END {for(s in a) if (s !~ /[^a-z]/) print s,a[s] }' file
```
When you use GNU awk, you can do this a bit differently and optimized by setting each record to be a word. This way the end selection does not need to happen:
```
awk -v n=3 -v RS='[[:space:]]' '
    (length>=n){ for(i=length-n+1;i>0;--i) a[tolower(substr($0,i,n))]++ }
    END {for(s in a) print s,a[s] }' file
```
0 讨论(0)

查看其它3个回答
发布评论:

提交评论
- 加载中...

热议问题