问题
How can i compare two data frames (test and control) of unequal length, and remove the row from test based on three criteria, i) if the test$chr == control$chr ii) test$start and test$end lies with in the range of control$start and control$end iii) test$CNA and control$CNA are same.
test =
R_level logp chr start end CNA Gene
2 7.079 11 1159 1360 gain Recl,Bcl
11 2.4 12 6335 6345 loss Pekg
3 19 13 7180 7229 loss Sox1
control =
R_level logp chr start end CNA Gene
2 5.9 11 1100 1400 gain Recl,Bcl
2 3.46 11 1002 1345 gain Trp1
2 6.4 12 6705 6845 gain Pekg
4 7 13 6480 8129 loss Sox1
The result should look something like this
result =
R_level logp chr start end CNA Gene
11 2.4 12 6335 6345 loss Pekg
回答1:
Here's one way using foverlaps()
from data.table
.
require(data.table) # v1.9.4+
dt1 <- as.data.table(test)
dt2 <- as.data.table(control)
setkey(dt2, chr, CNA, start, end)
olaps = foverlaps(dt1, dt2, nomatch=0L, which=TRUE, type="within")
# xid yid
# 1: 1 2
# 2: 3 4
dt1[!olaps$xid]
# R_level logp chr start end CNA Gene
# 1: 11 2.4 12 6335 6345 loss Pekg
Read ?foverlaps
and see the examples section for more info.
Alternatively, you can also use GenomicRanges
package. However, you might have to filter based on CNA
after merging by overlapping regions (AFAICT).
回答2:
When you say "exclude the variable", I assume you mean you want to remove the rows that satisfies those criteria.
If so, you are nearly there. The following should work:
exclude_bool <- data1[,3] == data2[,3] &
data1[,4] > data2[,5] &
data1[,5] < data2[,4] &
data1[,6] == data2[,6]
data1 <- data1[!exclude_bool , ]
来源:https://stackoverflow.com/questions/28109327/subset-only-those-rows-whose-intervals-does-not-fall-within-another-data-frame