问题
I have several files in a directory and in some of them, some patterns occur multiple times. For example
Contents of file "8_list
":
Spiroplasma_taiwanense
Spiroplasma_diminutum
Spiroplasma_apis
Spiroplasma_sabaudiense
Spiroplasma_taiwanense
Spiroplasma_diminutum
Spiroplasma_taiwanense
EntAcro10
EntAcro10
Spiroplasma_apis
Spiroplasma_culicicola
Spiroplasma_sabaudiense
Spiroplasma_diminutum
Spiroplasma_sabaudiense
Spiroplasma_sabaudiense
Spiroplasma_sabaudiense
Spiroplasma_apis
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_culicicola
Spiroplasma_diminutum
Spiroplasma_culicicola
Spiroplasma_culicicola
EntAcro1
and contents of file "574_list
"
Mesoplasma_florum_l1
Spiroplasma_sabaudiense
Mesoplasma_florum_w37
EntAcro1
all files have a single column.
What I want to do is within each file find the identical patterns and then add a number next to it describing the occurrence. For example, in file "8_list
" if Spiroplasma_culicicola
occurs 7 times, then next to the first occurrence, it should write Spiroplasma_culicicola_1
,
next to the second occurrence Spiroplasma_culicicola_2
next to the third occurrence Spiroplasma_culicicola_3
etc etc
I tried to do it with sed
by looking for each pattern individually
sed -z 's/Spiroplasma_culicicola/Spiroplasma_culicicola_2/2'
but I was wondering if there is an easier way in order to do it for all my files and all patterns in a given directory
thanks in advance
回答1:
This is a good task for such nice tool as awk
:
awk '{gsub(" ", "", $0); a[$0]++; print $0"_"a[$0]}' 8_list
gsub(" ", "", $0);
- replaces trailing space at the end of the line
a[$0]++;
- incrementing the number of occurrences of each pattern(column value) treating a column value as an array key
The output:
Spiroplasma_taiwanense_1
Spiroplasma_diminutum_1
Spiroplasma_apis_1
Spiroplasma_sabaudiense_1
Spiroplasma_taiwanense_2
Spiroplasma_diminutum_2
Spiroplasma_taiwanense_3
EntAcro10_1
EntAcro10_2
Spiroplasma_apis_2
Spiroplasma_culicicola_1
Spiroplasma_sabaudiense_2
Spiroplasma_diminutum_3
Spiroplasma_sabaudiense_3
Spiroplasma_sabaudiense_4
Spiroplasma_sabaudiense_5
Spiroplasma_apis_3
Spiroplasma_culicicola_2
Spiroplasma_culicicola_3
Spiroplasma_culicicola_4
Spiroplasma_culicicola_5
Spiroplasma_diminutum_4
Spiroplasma_culicicola_6
Spiroplasma_culicicola_7
EntAcro1_1
来源:https://stackoverflow.com/questions/42905237/find-the-number-of-occurences-and-add-it-next-to-the-pattern