问题
Guys I'm new to awk and I'm struggling with awk command to find the standard deviation.
I have got the mean using the following:
echo ${GfieldList[@]} | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "Mean= " sum / NF; }'
Standard Deviation formula is:
sqrt((1/N)*(sum of (value - mean)^2))
I have found the mean using the above formula
Can you guys help me with the awk command for this one?
回答1:
Once you know the mean:
awk '{
for (i = 1;i <= NF; i++) {
sum += $i
};
print sum / NF
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 5
then the standard deviation can be found thus:
awk -vM=5 '{
for (i = 1; i <= NF; i++) {
sum += ($i-M) * ($i-M)
};
print sqrt (sum / NF)
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 2
In "compressed" form:
awk '{for(i=1;i<=NF;i++){sum+=$i};print sum/NF}'
awk -vM=5 '{for(i=1;i<=NF;i++){sum+=($i-M)*($i-M)};print sqrt(sum/NF)}'
(changing the value for M
to the actual mean extracted from the first command).
回答2:
An alternate formula for the standard deviation is the square root of the quantity: (the mean square minus the square of the mean). This is used below:
$ echo 20 21 22 | awk 'NF {sum=0;ssq=0;for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}; print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5}'
Std Dev=0.816497
Notes:
In
awk
,NF
is the number of "fields" on a line. In our case, every field is a number, soNF
is the number of numbers on a given line.ssq
is the sum of the squares of each number on the line. Thus,ssq/NF
is the mean square.sum
is the sum of the numbers on the line. Thussum/NF
is the mean and(sum/NF)**2
is the square of the mean.As per the formular, then, the standard deviation is
(ssq/NF-(sum/NF)**2)**0.5
.
The awk
code
NF
This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero. In other words, this condition will cause empty lines to be skipped.
sum=0;ssq=0;
This initializes
sum
andssq
to zero. This is only needed if there is more than one line of input.for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}
This puts the sum of all the numbers in
sum
and the sum of the square of the numbers inssq
.print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5
This prints out the standard deviation.
来源:https://stackoverflow.com/questions/28271599/need-to-calculate-standard-deviation-from-an-array-using-bash-and-awk