I have below data named atp.csv file
2015-01-05 00:00:00 076,1941321748,BD9010423590206,200,Transaction Successfu
All in (g)awk
awk -F, 'NR>1{a[$4]++;b[$4]+=$6}
END{n=asorti(a,c);for(i=1;i<=n;i++)print c[i]","a[c[i]]","b[c[i]]}' file
You can try this awk
version also
awk -F',' '{print $4,",", a[$4]+=$6}' FileName | sort -r | uniq -cw 6 | sort -r
Output :
3 200 , 4500
1 351 , 5000
Another Way:
awk -F',' '{print $4,",", a[$4]+=$6}' FileName | sort -r | uniq -cw 6 |sort -r | sed 's/\([^ ]\+\).\([^ ]\+\).../\2,\1,/'
AWK has associative arrays.
% cat atp.csv | awk -F, 'NR>1 {n[$4]+=1;s[$4]+=$6;} END {for (k in n) { print k "," n[k] "," s[k]; }}' | sort
In the above:
The first line (record) is skipped with NR>1
is the number of occurrences of key k
(so we add 1), and s[k]
is the running sum values in field 6 (so we add $6
Finally, after all records are processed (END
), you can iterate over associated arrays by key (for (k in n) { ... }
) and print the keys and values in arrays n
and s
associated with the key.