I have a dataframe of n rows and 3
df <- data.frame(start=c(178,400,983,1932,33653),
end=c(5025,5025, 5535, 6918, 38197),
group=c(1,1,2,2,3))
df
I think this is possible with data.table::foverlaps
:
library(data.table)
setDT(df)
setkey(df,start,end)
df[,row_id:=1:nrow(df)]
temp <- foverlaps(df,df)
temp[, `:=`(c("start","end"),list(min(start,i.start),max(end,i.end))),by=row_id]
temp[, `:=`(c("start","end"),list(min(start,i.start),max(end,i.end))),by=i.row_id]
temp2 <- temp[, list(group2=.GRP, row_id=unique(c(row_id,i.row_id))),by=.(start,end)][,.(row_id,group2)]
setkey(df,row_id)
setkey(temp2,row_id)
temp2[df]
You'll need IRanges
package:
require(IRanges)
ir <- IRanges(df$start, df$end)
df$group2 <- subjectHits(findOverlaps(ir, reduce(ir)))
> df
# start end group group2
# 1 178 5025 1 1
# 2 400 5025 1 1
# 3 983 5535 2 1
# 4 1932 6918 2 1
# 5 33653 38197 3 2
To install IRanges
, type these lines in R:
source("http://bioconductor.org/biocLite.R")
biocLite("IRanges")
To learn more (manual etc..) go here