How to combine R dataframes based constraints on a time column

后端 未结 2 1212
佛祖请我去吃肉
佛祖请我去吃肉 2021-01-21 16:54

I have two R tables, each with a list of users and a timestamp corresponding to the time that they took a certain action.

The first of these (df1) two tabl

相关标签:
2条回答
  • 2021-01-21 17:30

    Here is a data.table solution.

    # load data.table and make cast data.frames as data.tables
    library(data.table)
    setDT(df1)
    setDT(df2)
    
    # add time variables, perform join and removing merging time variable
    dfDone <- df2[, time2 := time][df1[, time1 := time],
                  on=.(user, time > time)][, time:= NULL]
    
    dfDone
       user               time2               time1
    1:    1 2016-12-01 11:50:11 2016-12-01 08:53:20
    2:    1                <NA> 2016-12-01 12:45:47
    3:    2                <NA> 2016-12-01 15:34:54
    4:    3 2016-12-01 01:19:10 2016-12-01 00:49:50
    

    If you want to order the columns, you could use setcolorder

    setcolorder(dfDone, c("user", "time1", "time2"))
    
    dfDone
       user               time1               time2
    1:    1 2016-12-01 08:53:20 2016-12-01 11:50:11
    2:    1 2016-12-01 12:45:47                <NA>
    3:    2 2016-12-01 15:34:54                <NA>
    4:    3 2016-12-01 00:49:50 2016-12-01 01:19:10
    
    0 讨论(0)
  • 2021-01-21 17:32

    Part 1 - Original Question

    The first part of your question can be answered with the sqldf package.

    library(sqldf)
    df3 <- sqldf("SELECT * FROM df1 a 
                 LEFT JOIN df2 b ON a.time < b.time 
                 AND a.user = b.user")[,c(1:2, 4)]
    
    #rename to match OP post
    names(df3) <- c("user", "time_1", "time_2")
    
    > df3
      user              time_1              time_2
    1    1 2016-12-01 08:53:20 2016-12-01 11:50:11
    2    1 2016-12-01 12:45:47                <NA>
    3    2 2016-12-01 15:34:54                <NA>
    4    3 2016-12-01 00:49:50 2016-12-01 01:19:10
    

    Part 2 - Time Window

    If you want a window of time to allow for the match, you can subtract seconds within the SQL statement as follows:

    df3 <- sqldf("SELECT * FROM df1 a 
                 LEFT JOIN df2 b ON a.time < (b.time - 10000)
                 AND a.user = b.user")[,c(1:2, 4)]
    > df3
      user                time              time.1
    1    1 2016-12-01 08:53:20 2016-12-01 11:50:11
    2    1 2016-12-01 12:45:47                <NA>
    3    2 2016-12-01 15:34:54                <NA>
    4    3 2016-12-01 00:49:50                <NA>
    

    Note, whatever you select from b.time will be in seconds.

    0 讨论(0)
提交回复
热议问题