Selecting columns using specific patterns then finding sum and ratio

谁说我不能喝 提交于 2019-12-11 10:08:50

问题


I want to calculate the sum and ratio values from data below. (The actual data contains more than 200,000 columns and 45000 rows (lines)).

For clarity purpose I have given only simple data format.

#Frame  BMR_42@O22  BMR_49@O13  BMR_59@O13  BMR_23@O26  BMR_10@O13  BMR_61@O26  BMR_23@O25 
 1      1           1           0           1           1           1           1
 2      0           1           0           0           1           1           0
 3      1           1           1           0           0           1           1
 4      1           1           0           0           1           0           1
 5      0           0           0           0           0           0           0
 6      1           0           1           1           0           1           0
 7      1           1           1           1           0           0           0
 8      1           1           1           0           0           0           0
 9      1           1           1           1           1           1           1
10      0           0           0           0           0           0           0

The columns need to be selected with certain criteria.

The column data which I consider is columns with "@O13" only. Below I have given the selected columns from above example.

BMR_49@O13  BMR_59@O13  BMR_10@O13  
1           0           1       
1           0           1       
1           1           0       
1           0           1       
0           0           0       
0           1           0       
1           1           0       
1           1           0       
1           1           1       
0           0           0   

From the selected column, I want to calculate:

1) the sum of all the "1"s. In this example we get value 16.

2) the number of total rows containing occurrence of "1" (at least once). From above example there are 8 rows which contain at least one occurrence of "1".

lastly,

3) the ratio of total of all "1"s with total lines with occurrence of "1"s.

That is :: (total of all "1"s)/(total rows with the occurance of "1"). Example 16/8

As a start, I tried with this command to select only the columns with "@O13"

awk '{for (i=1;i<=NF;i++) if (i~/@O13/); print ""}' $file2

Although this run but doesn't show up the values.


回答1:


This should do:

awk 'NR==1{for (i=1;i<=NF;i++) if ($i~/@O13/) a[i];next} {f=0;for (i in a) if ($i) {s++;f++};if (f) r++} END {print "number of 1="s"\nrows with 1="r"\nratio="s/r}' file
number of 1=16
rows with 1=8
ratio=2

Some more readable:

awk '
NR==1{
    for (i=1;i<=NF;i++) 
        if ($i~/@O13/)
            a[i]
        next
    }
    {
    f=0
    for (i in a)
        if ($i=="1") {
            s++
            f++
        }
    if (f) r++
    } 
END {
    print   "number of 1="s \
            "\nrows with 1="r \
            "\nratio="s/r
    }
' file


来源:https://stackoverflow.com/questions/29084269/selecting-columns-using-specific-patterns-then-finding-sum-and-ratio

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!