I have a dataframe df1:
df1 <- read.table(text=\" Chr06 79641
Chr06 82862
Chr06 387314
Chr06 656098
Chr06 678491
Chr06 1018696\", heade
You can do this using sapply
:
sapply(1:nrow(df1), function(x) any(df1[x,2] >= df2$V2 &
df1[x,2] <= df2$V3 &
df1[x, 1] == df2$V1))
[1] FALSE TRUE TRUE FALSE FALSE TRUE
Using GenomicRanges:
#Convert to Granges objects
gr1 <- GRanges(seqnames = df1$V1,
ranges = IRanges(df1$V2, df1$V2))
gr2 <- GRanges(seqnames = df2$V1,
ranges = IRanges(df2$V2, df2$V3))
#Subset gr1
subsetByOverlaps(gr1, gr2)
# GRanges object with 3 ranges and 0 metadata columns:
# seqnames ranges strand
# <Rle> <IRanges> <Rle>
# [1] Chr06 [ 82862, 82862] *
# [2] Chr06 [ 387314, 387314] *
# [3] Chr06 [1018696, 1018696] *
# -------
# seqinfo: 1 sequence from an unspecified genome; no seqlengths
#Or we can use merge
mergeByOverlaps(gr1, gr2)
# DataFrame with 3 rows and 2 columns
# gr1 gr2
# <GRanges> <GRanges>
# 1 Chr06:*:[ 82862, 82862] Chr06:*:[ 79720, 87043]
# 2 Chr06:*:[ 387314, 387314] Chr06:*:[ 387314, 387371]
# 3 Chr06:*:[1018696, 1018696] Chr06:*:[1018676, 1018736]
Also, look into bedtools:
Collectively, the bedtools utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic: that is, set theory on the genome. For example, bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF. While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.
Here is a data.table
solution as an alternative to GenomicRanges
:
library(data.table)
dt1 <- data.table(df1)[, V3 := V2]
dt2 <- data.table(df2, key = c("V2", "V3"))
foverlaps(dt1, dt2)[V1 == i.V1][, -c(4, 6), with = F]
# V1 V2 V3 i.V3
# 1: Chr06 79720 87043 82862
# 2: Chr06 387314 387371 387314
# 3: Chr06 1018676 1018736 1018696