Standard deviation of multiple files having different row sizes

老子叫甜甜 提交于 2020-07-08 03:41:26

问题


This question is related to my previous one Average of multiple files having different row sizes

I have few files with different row sizes, but number of columns in each file is same. e.g.

ifile1.txt

1       1001    ?       ?
2       1002    ?       ?
3       1003    ?       ?
4       1004    ?       ?
5       1005    ?       0
6       1006    ?       1
7       1007    ?       3
8       1008    5       4
9       1009    3       11
10      1010    2       9

ifile2.txt

1       2001    ?       ?
2       2002    ?       ?
3       2003    ?       ?
4       2004    ?       ?
5       2005    ?       0
6       2006    6       12
7       2007    6       5
8       2008    9       10
9       2009    3       12
10      2010    5       7
11      2011    2       ?
12      2012    9       ?

ifile3.txt

1       3001    ?       ?
2       3002    ?       6
3       3003    ?       ?
4       3004    ?       ?
5       3005    ?       0
6       3006    1       25
7       3007    2       3
8       3008    ?       ?

In each file 1st column represents the index number and 2nd column as ID. I would like to calculate the standard deviation for each index number from 3rd column onward.

The desired output:

1       ?       ?          ----  [Here ? is computed from ?, ?, ?] So answer is ?
2       ?       ?          ----  [Here 6 is computed from ?, ?, 6] So answer is ? as only one sample
3       ?       ?
4       ?       ?
5       ?       0.00       ----- [Here 0 is computed from 0, 0, 0] So answer is as all are same value
6       3.54    12.01
7       2.83    1.15
8       2.83    4.24       ----- [Here 7 is computed from 5, 9, ?]
9       0.00    0.71
10      2.12    1.41
11      ?       ?
12      ?       ?
  

I was trying modify the following script, but can't able to define the array

awk '
{
    c = NF
    if (r<FNR) r = FNR

    for (i=3;i<=NF;i++) {
        if ($i != "?") {
            s[FNR "," i] += $i
            n[FNR "," i] += 1
        }
    }
}

END {
    for (i=1;i<=r;i++) {
        printf("%s\t", i)
        for (j=3;j<=c;j++) {
            if (n[i "," j]) {
              mean=s[i "," j]/n[i "," j]
              for (i=1; i in array ; i++)
                sqdif+=(array[i]-mean)**2
                 printf("%.1f\t", sqdif/(n[i "," j]-1)**0.5)
            } else {
                printf("?\t")
            }
        }
        printf("\n")
    }
}

' ifile*

来源:https://stackoverflow.com/questions/62778563/standard-deviation-of-multiple-files-having-different-row-sizes

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!