问题
Imagine that you want to keep the records with the highest value in a given field of a table, just comparing within the categories defined by another field (and ignoring the contents of the others).
So, given the input nye.txt:
X A 10.00
X A 1.50
X B 0.01
X B 4.00
Y C 1.00
Y C 2.43
You'd expect this output:
X A 10.00
Y C 2.43
This is an offshot of this previous, related thread: awk: keep records with the highest value, comparing those that share other fields
I already have a solution (see below), but ideas are welcome!
回答1:
Something like this with awk:
awk '$3>=a[$1]{a[$1]=$3; b[$1]=$0} END{for(i in a)print b[i]}' File
For each 1st column value
(X, Y etc..), if the 3rd column value
is greater than or equal to the previously stored great value (i.e a[$i]
; initially it will be 0
by default), update a[$i] with this 3rd column value
. Also save the entire line in array b. Within END
block, print the results.
Output:
AMD$ awk '$3>a[$1]{a[$1]=$3; b[$1]=$0} END{for(i in a)print b[i]}' File
X A 10.00
Y C 2.43
回答2:
My solution is:
awk '{ k=$1 } { split(a[k],b," ") } $3>b[2] { a[k]=$2" "$3 } END { for (i in a) print i,a[i] }' nye.txt
The first bracket block indicates which field defines the categories within which you want to compare the other field (1st and 3rd fields, in this case).
(based on https://stackoverflow.com/a/29239235/3298298)
Ideas welcome!
来源:https://stackoverflow.com/questions/29253200/awk-keep-records-with-the-highest-value-that-share-a-field-while-ignoring-othe