Use awk to sum or average for each unique ID

后端 未结 1 804
一整个雨季
一整个雨季 2020-12-04 00:10

Can anyone tell me how to use awk in order to calculate the sum of two individuals columns or the average of one column for each unique ID.

Input

ch         


        
相关标签:
1条回答
  • 2020-12-04 00:51

    sum of columns 5 and 6 per id:

    awk '{sum5[$10] += $5; sum6[$10] += $6}; END{ for (id in sum5) { print id, sum5[id], sum6[id] } }' < /tmp/input 
    NM_00175642 25 2
    NM_01011874 21 1
    

    Explained: $10 is the id field, $5 and $6 are columns 5 and 6. We build 2 arrays for summing columns 5 and 6 (which are indexed by strings, so we can use the id field). Once we've processed all the lines/records, we iterate through the array keys (id strings), and print the value at that array index.

    average of column 4 per id:

    awk '{sum4[$10] += $4; count4[$10]++}; END{ for (id in sum4) { print id, sum4[id]/count4[id] } }' < /tmp/input 
    NM_00175642 0.05
    NM_01011874 0.05
    

    Explained: Very similar to the summing example. We keep a sum of column 4 per id, and a count of records seen for each id. At the end, we iterate through the ids and print the sum/count.

    I don't do much with awk, I find Perl much better for small scripts. But this looks like a good starting point. There are links to more pages with example scripts.

    0 讨论(0)
提交回复
热议问题