Compute average and standard deviation with awk

后端 未结 4 827
误落风尘
误落风尘 2020-12-14 16:43

I have a \'file.dat\' with 24 (rows) x 16 (columns) data.

I have already tested the following awk script that computes de average of each column.

t         


        
相关标签:
4条回答
  • 2020-12-14 17:17

    Your script should somehow be in this form instead:

    awk '{
        sum = 0
        for (i=1; i<=NF; i++) {
            sum += $i
        }
        avg = sum / NF
        avga[NR] = avg
        sum = 0
        for (i=1; i<=NF; i++) {
            sum += ($i - avg) ^ 2
        }
        stda[NR] = sqrt(sum / NF)
    }
    
    END { for (i = 1; i in stda; ++i) { printf "%f %f \n", avga[i], stda[i] } }' file.dat >> aver-std.dat
    
    0 讨论(0)
  • 2020-12-14 17:18

    Standard deviation is

    stdev = sqrt((1/N)*(sum of (value - mean)^2))
    

    But there is another form of the formula which does not require you to know the mean beforehand. It is:

    stdev = sqrt((1/N)*((sum of squares) - (((sum)^2)/N)))
    

    (A quick web search for "sum of squares" formula for standard deviation will give you the derivation if you are interested)

    To use this formula, you need to keep track of both the sum and the sum of squares of the values. So your awk script will change to:

        awk '{for(i=1;i<=NF;i++) {sum[i] += $i; sumsq[i] += ($i)^2}} 
              END {for (i=1;i<=NF;i++) {
              printf "%f %f \n", sum[i]/NR, sqrt((sumsq[i]-sum[i]^2/NR)/NR)}
             }' file.dat >> aver-std.dat
    
    0 讨论(0)
  • 2020-12-14 17:19

    To simply calculate the population standard deviation of a list of numbers, you can use a command like this:

    awk '{x+=$0;y+=$0^2}END{print sqrt(y/NR-(x/NR)^2)}'
    

    Or this calculates the sample standard deviation:

    awk '{sum+=$0;a[NR]=$0}END{for(i in a)y+=(a[i]-(sum/NR))^2;print sqrt(y/(NR-1))}'
    

    ^ is in POSIX. ** is supported by gawk and nawk but not by mawk.

    0 讨论(0)
  • 2020-12-14 17:27

    Here is some calculation I've made on a grinder data output file for a long soak test which had to be interrupted:

    Standard deviation(biased) + average:

    cat <grinder_data_file> | grep -v "1$" | awk -F ', '  '{   sum=sum+$5 ; sumX2+=(($5)^2)} END { printf "Average: %f. Standard Deviation: %f \n", sum/NR, sqrt(sumX2/(NR) - ((sum/NR)^2) )}'
    

    Standard deviation(non-biased) + average:

    cat <grinder_data_file>  | grep -v "1$" | awk -F ', '  '{   sum=sum+$5 ; sumX2+=(($5)^2)} END { avg=sum/NR; printf "Average: %f. Standard Deviation: %f \n", avg, sqrt(sumX2/(NR-1) - 2*avg*(sum/(NR-1)) + ((NR*(avg^2))/(NR-1)))}'
    
    0 讨论(0)
提交回复
热议问题