normalize column data with maximum value of that column

问题

I have a data file with two columns. I want to find out the maximum data value from the second column and divide each entries of second column witht he maximum value. (So I will get all the entries in second column <= 1.00).

I tried with this command below:

awk 'BEGIN {max = 0} {if ($2>max) max=$2} {print  ($2/max)}' angleOut.dat

but I get error message as below.

awk: (FILENAME=angleOut.dat FNR=1) fatal: division by zero attempted

note: There are some data in the second column which is zero value. But when the zero value divide with max value, I should get zero, but I get error as above.

Could I get any help for this?

Many thanks in advance.

回答1:

Let's take this as the sample input file:

$ cat >file
1 5
2 2
3 7
4 6

This awk script will normalize the second column:

$ awk 'FNR==NR{max=($2+0>max)?$2:max;next} {print $1,$2/max}' file file
1 0.714286
2 0.285714
3 1
4 0.857143

This script reads through the input file twice. The first time, it finds the maximum. The second time is prints the lines with the second column normalized.

The Ternary Statement

Consider:

max=($2+0>max)?$2:max

This is a compact form of an if-then-else statement. The "if" part is $2+0>max. If this evaluates to true, the value following the ? is assigned to max. If it is false, then the value following the : is assigned to max.

The more explicit form of an if statement works well too.

Also, note that incantation $2+0. In awk variables can be strings or numbers according to context. In string context, > compares lexicographic ordering. We want a numeric comparison. By adding zero to $2, we are removing all doubt and forcing awk to treat $2 as a number.

回答2:

You cannot determine max before seeing the whole file so you need two passes. This one uses two awk executions to get the normalized output:

awk -vmax=$(awk 'max < $2 { max = $2 } END { print max }' angleOut.dat) \
    '{print $2 / max}' angleOut.dat

来源：https://stackoverflow.com/questions/29003301/normalize-column-data-with-maximum-value-of-that-column

标签

awk

gawk