find the number of occurences and add it next to the pattern

只愿长相守 提交于 2019-12-11 04:59:26

问题


I have several files in a directory and in some of them, some patterns occur multiple times. For example

Contents of file "8_list":

Spiroplasma_taiwanense 
Spiroplasma_diminutum 
Spiroplasma_apis 
Spiroplasma_sabaudiense 
Spiroplasma_taiwanense 
Spiroplasma_diminutum 
Spiroplasma_taiwanense 
EntAcro10
EntAcro10
Spiroplasma_apis 
Spiroplasma_culicicola 
Spiroplasma_sabaudiense 
Spiroplasma_diminutum 
Spiroplasma_sabaudiense 
Spiroplasma_sabaudiense 
Spiroplasma_sabaudiense 
Spiroplasma_apis 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
Spiroplasma_diminutum 
Spiroplasma_culicicola 
Spiroplasma_culicicola 
EntAcro1

and contents of file "574_list"

Mesoplasma_florum_l1
Spiroplasma_sabaudiense 
Mesoplasma_florum_w37
EntAcro1

all files have a single column. What I want to do is within each file find the identical patterns and then add a number next to it describing the occurrence. For example, in file "8_list" if Spiroplasma_culicicola occurs 7 times, then next to the first occurrence, it should write Spiroplasma_culicicola_1, next to the second occurrence Spiroplasma_culicicola_2 next to the third occurrence Spiroplasma_culicicola_3 etc etc

I tried to do it with sed by looking for each pattern individually

sed -z 's/Spiroplasma_culicicola/Spiroplasma_culicicola_2/2'

but I was wondering if there is an easier way in order to do it for all my files and all patterns in a given directory

thanks in advance


回答1:


This is a good task for such nice tool as awk:

awk '{gsub(" ", "", $0); a[$0]++; print $0"_"a[$0]}' 8_list

gsub(" ", "", $0); - replaces trailing space at the end of the line

a[$0]++; - incrementing the number of occurrences of each pattern(column value) treating a column value as an array key


The output:

Spiroplasma_taiwanense_1
Spiroplasma_diminutum_1
Spiroplasma_apis_1
Spiroplasma_sabaudiense_1
Spiroplasma_taiwanense_2
Spiroplasma_diminutum_2
Spiroplasma_taiwanense_3
EntAcro10_1
EntAcro10_2
Spiroplasma_apis_2
Spiroplasma_culicicola_1
Spiroplasma_sabaudiense_2
Spiroplasma_diminutum_3
Spiroplasma_sabaudiense_3
Spiroplasma_sabaudiense_4
Spiroplasma_sabaudiense_5
Spiroplasma_apis_3
Spiroplasma_culicicola_2
Spiroplasma_culicicola_3
Spiroplasma_culicicola_4
Spiroplasma_culicicola_5
Spiroplasma_diminutum_4
Spiroplasma_culicicola_6
Spiroplasma_culicicola_7
EntAcro1_1


来源:https://stackoverflow.com/questions/42905237/find-the-number-of-occurences-and-add-it-next-to-the-pattern

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!