Subset only those rows whose intervals does not fall within another data.frame

守給你的承諾、 提交于 2020-01-15 12:49:10

问题


How can i compare two data frames (test and control) of unequal length, and remove the row from test based on three criteria, i) if the test$chr == control$chr ii) test$start and test$end lies with in the range of control$start and control$end iii) test$CNA and control$CNA are same.

    test = 
        R_level  logp   chr start   end     CNA    Gene
        2     7.079     11  1159    1360    gain   Recl,Bcl
        11    2.4       12  6335    6345    loss   Pekg
        3     19        13  7180    7229    loss   Sox1

control =

  R_level    logp   chr  start  end     CNA    Gene
        2     5.9     11  1100  1400    gain   Recl,Bcl 
        2     3.46    11  1002  1345    gain    Trp1
        2     6.4     12  6705  6845    gain    Pekg
        4     7       13  6480  8129    loss    Sox1

The result should look something like this

result =
     R_level     logp   chr start   end     CNA     Gene
          11      2.4    12  6335   6345    loss   Pekg

回答1:


Here's one way using foverlaps() from data.table.

require(data.table) # v1.9.4+
dt1 <- as.data.table(test)
dt2 <- as.data.table(control)
setkey(dt2, chr, CNA, start, end)

olaps = foverlaps(dt1, dt2, nomatch=0L, which=TRUE, type="within")
#    xid yid
# 1:   1   2
# 2:   3   4

dt1[!olaps$xid]
#    R_level logp chr start  end  CNA Gene
# 1:      11  2.4  12  6335 6345 loss Pekg

Read ?foverlaps and see the examples section for more info.

Alternatively, you can also use GenomicRanges package. However, you might have to filter based on CNA after merging by overlapping regions (AFAICT).




回答2:


When you say "exclude the variable", I assume you mean you want to remove the rows that satisfies those criteria.

If so, you are nearly there. The following should work:

exclude_bool <- data1[,3] == data2[,3] &
data1[,4] > data2[,5] &
data1[,5] < data2[,4] &
data1[,6] == data2[,6] 

data1 <- data1[!exclude_bool , ]


来源:https://stackoverflow.com/questions/28109327/subset-only-those-rows-whose-intervals-does-not-fall-within-another-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!