问题
I have a data file with two columns. I want to find out the maximum data value from the second column and divide each entries of second column witht he maximum value. (So I will get all the entries in second column <= 1.00).
I tried with this command below:
awk 'BEGIN {max = 0} {if ($2>max) max=$2} {print ($2/max)}' angleOut.dat
but I get error message as below.
awk: (FILENAME=angleOut.dat FNR=1) fatal: division by zero attempted
note: There are some data in the second column which is zero value. But when the zero value divide with max value, I should get zero, but I get error as above.
Could I get any help for this?
Many thanks in advance.
回答1:
Let's take this as the sample input file:
$ cat >file
1 5
2 2
3 7
4 6
This awk script will normalize the second column:
$ awk 'FNR==NR{max=($2+0>max)?$2:max;next} {print $1,$2/max}' file file
1 0.714286
2 0.285714
3 1
4 0.857143
This script reads through the input file
twice. The first time, it finds the maximum. The second time is prints the lines with the second column normalized.
The Ternary Statement
Consider:
max=($2+0>max)?$2:max
This is a compact form of an if-then-else statement. The "if" part is $2+0>max
. If this evaluates to true, the value following the ?
is assigned to max
. If it is false, then the value following the :
is assigned to max
.
The more explicit form of an if
statement works well too.
Also, note that incantation $2+0
. In awk
variables can be strings or numbers according to context. In string context, >
compares lexicographic ordering. We want a numeric comparison. By adding zero to $2
, we are removing all doubt and forcing awk
to treat $2
as a number.
回答2:
You cannot determine max before seeing the whole file so you need two passes. This one uses two awk
executions to get the normalized output:
awk -vmax=$(awk 'max < $2 { max = $2 } END { print max }' angleOut.dat) \
'{print $2 / max}' angleOut.dat
来源:https://stackoverflow.com/questions/29003301/normalize-column-data-with-maximum-value-of-that-column