Can anyone tell me how to use awk in order to calculate the sum of two individuals columns or the average of one column for each unique ID.
Input
ch
sum of columns 5 and 6 per id:
awk '{sum5[$10] += $5; sum6[$10] += $6}; END{ for (id in sum5) { print id, sum5[id], sum6[id] } }' < /tmp/input
NM_00175642 25 2
NM_01011874 21 1
Explained: $10 is the id field, $5 and $6 are columns 5 and 6. We build 2 arrays for summing columns 5 and 6 (which are indexed by strings, so we can use the id field). Once we've processed all the lines/records, we iterate through the array keys (id strings), and print the value at that array index.
average of column 4 per id:
awk '{sum4[$10] += $4; count4[$10]++}; END{ for (id in sum4) { print id, sum4[id]/count4[id] } }' < /tmp/input
NM_00175642 0.05
NM_01011874 0.05
Explained: Very similar to the summing example. We keep a sum of column 4 per id, and a count of records seen for each id. At the end, we iterate through the ids and print the sum/count.
I don't do much with awk, I find Perl much better for small scripts. But this looks like a good starting point. There are links to more pages with example scripts.