remove lines based on value of two columns

后端 未结 1 391
遇见更好的自我
遇见更好的自我 2021-01-28 05:34

I have a huge file (my_file.txt) with ~ 8,000,000 lines that looks like this:

1   13110   13110   rs540538026 0   NA  -1.33177622457982
1   13116   13116   rs626         


        
相关标签:
1条回答
  • 2021-01-28 06:00
    $ awk '(i=$1 FS $2 FS $3) && !(i in seventh) || seventh[i] < $7 {seventh[i]=$7; all[i]=$0} END {for(i in a) print all[i]}' my_file.txt
    1   13013178    13013178    rs11122075  0   NA  -1.57404917386838
    1   13116   13116   rs62635286  0   NA  -2.87540758021667
    1   13118   13118   rs200579949 0   NA  -2.87540758021667
    1   13110   13110   rs540538026 0   NA  -1.33177622457982
    

    Thanks to @fedorqui for the advanced indexing. :D

    Explained:

    (i=$1 FS $2 FS $3) && !(i in seventh) || $7 > seventh[i] { # set index to first 3 fields 
                       # AND if index not yet stored in array 
                                          # OR the seventh field is greater than the previous value of the seventh field by the same index:
        seventh[i]=$7                     # new biggest value
        all[i]=$0                         # store that record
    } 
    END {
        for(i in all)                     # for all stored records of the biggest seventh value
            print all[i]                  # print them
    }
    
    0 讨论(0)
提交回复
热议问题