Need to calculate standard deviation from an array using bash and awk?

别等时光非礼了梦想. 提交于 2019-12-10 22:15:20

问题


Guys I'm new to awk and I'm struggling with awk command to find the standard deviation.

I have got the mean using the following:

echo ${GfieldList[@]} | awk 'NF {sum=0;for (i=1;i<=NF;i++)sum+=$i; print "Mean= " sum / NF; }'

Standard Deviation formula is:

sqrt((1/N)*(sum of (value - mean)^2))

I have found the mean using the above formula

Can you guys help me with the awk command for this one?


回答1:


Once you know the mean:

awk '{
    for (i = 1;i <= NF; i++) {
        sum += $i
    };
    print sum / NF
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 5

then the standard deviation can be found thus:

awk -vM=5 '{
    for (i = 1; i <= NF; i++) {
        sum += ($i-M) * ($i-M)
    };
    print sqrt (sum / NF)
}' # for 2, 4, 4, 4, 5, 5, 7, 9 gives 2

In "compressed" form:

awk '{for(i=1;i<=NF;i++){sum+=$i};print sum/NF}'
awk -vM=5 '{for(i=1;i<=NF;i++){sum+=($i-M)*($i-M)};print sqrt(sum/NF)}'

(changing the value for M to the actual mean extracted from the first command).




回答2:


An alternate formula for the standard deviation is the square root of the quantity: (the mean square minus the square of the mean). This is used below:

$ echo 20 21 22 | awk 'NF {sum=0;ssq=0;for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}; print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5}'
Std Dev=0.816497

Notes:

  • In awk, NF is the number of "fields" on a line. In our case, every field is a number, so NF is the number of numbers on a given line.

  • ssq is the sum of the squares of each number on the line. Thus, ssq/NF is the mean square.

  • sum is the sum of the numbers on the line. Thus sum/NF is the mean and (sum/NF)**2 is the square of the mean.

  • As per the formular, then, the standard deviation is (ssq/NF-(sum/NF)**2)**0.5.

The awk code

  • NF

    This serves as a condition: the statements which follow will only be executed if the number of fields on this line, NF, evaluates to true, meaning non-zero. In other words, this condition will cause empty lines to be skipped.

  • sum=0;ssq=0;

    This initializes sum and ssq to zero. This is only needed if there is more than one line of input.

  • for (i=1;i<=NF;i++){sum+=$i;ssq+=$i**2}

    This puts the sum of all the numbers in sum and the sum of the square of the numbers in ssq.

  • print "Std Dev=" (ssq/NF-(sum/NF)**2)**0.5

    This prints out the standard deviation.



来源:https://stackoverflow.com/questions/28271599/need-to-calculate-standard-deviation-from-an-array-using-bash-and-awk

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!