awk: keep records with the highest value that share a field, while ignoring other fields

问题

Imagine that you want to keep the records with the highest value in a given field of a table, just comparing within the categories defined by another field (and ignoring the contents of the others).

So, given the input nye.txt:

X A 10.00
X A 1.50
X B 0.01
X B 4.00
Y C 1.00
Y C 2.43

You'd expect this output:

X A 10.00
Y C 2.43

This is an offshot of this previous, related thread: awk: keep records with the highest value, comparing those that share other fields

I already have a solution (see below), but ideas are welcome!

回答1:

Something like this with awk:

awk '$3>=a[$1]{a[$1]=$3; b[$1]=$0} END{for(i in a)print b[i]}' File

For each 1st column value (X, Y etc..), if the 3rd column value is greater than or equal to the previously stored great value (i.e a[$i]; initially it will be 0 by default), update a[$i] with this 3rd column value. Also save the entire line in array b. Within END block, print the results.

Output:

AMD$ awk '$3>a[$1]{a[$1]=$3; b[$1]=$0} END{for(i in a)print b[i]}' File
X A 10.00
Y C 2.43

回答2:

My solution is:

awk '{ k=$1 } { split(a[k],b," ") } $3>b[2] { a[k]=$2" "$3 } END { for (i in a) print i,a[i] }' nye.txt

The first bracket block indicates which field defines the categories within which you want to compare the other field (1st and 3rd fields, in this case).

(based on https://stackoverflow.com/a/29239235/3298298)

Ideas welcome!

来源：https://stackoverflow.com/questions/29253200/awk-keep-records-with-the-highest-value-that-share-a-field-while-ignoring-othe

标签

bash

awk

gawk

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!