R: subset a data frame based on conditions from another data frame

后端 未结 2 2059
-上瘾入骨i
-上瘾入骨i 2021-01-14 18:33

Here is a problem I am trying to solve. Say, I have two data frames like the following:

observations <- data.frame(id = rep(rep(c(1,2,3,4), each=5), 5),
         


        
相关标签:
2条回答
  • 2021-01-14 19:02

    Here's a proposal with merge:

    # merge both data frames
    dat <- merge(observations, sampletimes, by = "id")
    # extract valid rows
    dat2 <- dat[dat$time > dat$time1 & dat$time < dat$time2, seq(4)]
    # sort
    dat2[order(dat2$time, dat2$id), ]
    

    The result:

        id time measurement location
    11   1    3    7.086246        a
    141  2    3    6.893162        b
    251  3    3   16.052627        c
    376  4    3   -6.559494        d
    47   1    8   11.506810        e
    137  2    8   10.959782        f
    267  3    8   11.079759        g
    402  4    8   11.082015        h
    83   1   13    5.584257        i
    218  2   13   -1.714845        j
    283  3   13  -11.196792        k
    418  4   13    8.887907        l
    99   1   18    1.656558        m
    234  2   18   16.573179        n
    364  3   18    6.522298        o
    454  4   18    1.005123        p
    125  1   23   -1.995719        q
    250  2   23   -6.676464        r
    360  3   23   10.514282        s
    490  4   23    3.863357        t
    
    0 讨论(0)
  • 2021-01-14 19:04

    Not efficient , but do the job :

     subset(merge(observations,sampletimes), time > time1 & time < time2)
            id time measurement location time1 time2
        11   1    3    3.180321        a     2     4
        47   1    8    6.040612        e     7     9
        83   1   13   -5.999317        i    12    14
        99   1   18    2.689414        m    17    19
        125  1   23   12.514722        q    22    24
        137  2    8    4.420679        f     7     9
        141  2    3   11.492446        b     2     4
        218  2   13    6.672506        j    12    14
        234  2   18   12.290339        n    17    19
        250  2   23   12.610828        r    22    24
        251  3    3    8.570984        c     2     4
        267  3    8   -7.112291        g     7     9
        283  3   13    6.287598        k    12    14
        360  3   23   11.941846        s    22    24
        364  3   18   -4.199001        o    17    19
        376  4    3    7.133370        d     2     4
        402  4    8   13.477790        h     7     9
        418  4   13    3.967293        l    12    14
        454  4   18   12.845535        p    17    19
        490  4   23   -1.016839        t    22    24
    

    EDIT

    Since you have more than 5 millions rows, you should give a try to a data.table solution:

    library(data.table)
    OBS <- data.table(observations)
    SAM <- data.table(sampletimes)
    merge(OBS,SAM,allow.cartesian=TRUE,by='id')[time > time1 & time < time2]
    
    0 讨论(0)
提交回复
热议问题