data frame lookup value in range and return different column

前端 未结 2 681
心在旅途
心在旅途 2021-01-06 05:14

I have two data frames and wish to use the value in one (DF1$pos) to search through two columns in DF2 (DF2start, DF2end) and if it falls within those numbers,

相关标签:
2条回答
  • 2021-01-06 05:42

    Perhaps you can use foverlaps from the "data.table" package.

    library(data.table)
    DT1 <- data.table(DF1)
    DT2 <- data.table(DF2)
    setkey(DT2, ID, start, end)
    DT1[, c("start", "end") := pos]  ## I don't know if there's a way around this step...
    foverlaps(DT1, DT2)
    #     ID start  end annot pos i.start i.end
    # 1: chr     1  200    a1  12      12    12
    # 2: chr   540 1002    a3 542     542   542
    # 3: chr   540 1002    a3 674     674   674
    foverlaps(DT1, DT2)[, c("ID", "pos", "annot"), with = FALSE]
    #     ID pos annot
    # 1: chr  12    a1
    # 2: chr 542    a3
    # 3: chr 674    a3
    

    As mentioned by @Arun in the comments, you can also use which = TRUE in foverlaps to extract the relevant values:

    foverlaps(DT1, DT2, which = TRUE)
    #    xid yid
    # 1:   1   1
    # 2:   2   3
    # 3:   3   3
    DT2$annot[foverlaps(DT1, DT2, which = TRUE)$yid]
    # [1] "a1" "a3" "a3"
    
    0 讨论(0)
  • 2021-01-06 06:01

    You could also use IRanges

    source("http://bioconductor.org/biocLite.R")
    biocLite("IRanges")
    library(IRanges)
    DF1N <- with(DF1, IRanges(pos, pos))
    DF2N <- with(DF2, IRanges(start, end))
    DF1$name <- DF2$annot[subjectHits(findOverlaps(DF1N, DF2N))]
    DF1
    #   ID pos name
    #1 chr  12   a1
    #2 chr 542   a3
    #3 chr 674   a3
    
    0 讨论(0)
提交回复
热议问题