I have two R tables, each with a list of users and a timestamp corresponding to the time that they took a certain action.
The first of these (df1
) two tabl
Here is a data.table
solution.
# load data.table and make cast data.frames as data.tables
library(data.table)
setDT(df1)
setDT(df2)
# add time variables, perform join and removing merging time variable
dfDone <- df2[, time2 := time][df1[, time1 := time],
on=.(user, time > time)][, time:= NULL]
dfDone
user time2 time1
1: 1 2016-12-01 11:50:11 2016-12-01 08:53:20
2: 1 <NA> 2016-12-01 12:45:47
3: 2 <NA> 2016-12-01 15:34:54
4: 3 2016-12-01 01:19:10 2016-12-01 00:49:50
If you want to order the columns, you could use setcolorder
setcolorder(dfDone, c("user", "time1", "time2"))
dfDone
user time1 time2
1: 1 2016-12-01 08:53:20 2016-12-01 11:50:11
2: 1 2016-12-01 12:45:47 <NA>
3: 2 2016-12-01 15:34:54 <NA>
4: 3 2016-12-01 00:49:50 2016-12-01 01:19:10
The first part of your question can be answered with the sqldf
package.
library(sqldf)
df3 <- sqldf("SELECT * FROM df1 a
LEFT JOIN df2 b ON a.time < b.time
AND a.user = b.user")[,c(1:2, 4)]
#rename to match OP post
names(df3) <- c("user", "time_1", "time_2")
> df3
user time_1 time_2
1 1 2016-12-01 08:53:20 2016-12-01 11:50:11
2 1 2016-12-01 12:45:47 <NA>
3 2 2016-12-01 15:34:54 <NA>
4 3 2016-12-01 00:49:50 2016-12-01 01:19:10
If you want a window of time to allow for the match, you can subtract seconds within the SQL
statement as follows:
df3 <- sqldf("SELECT * FROM df1 a
LEFT JOIN df2 b ON a.time < (b.time - 10000)
AND a.user = b.user")[,c(1:2, 4)]
> df3
user time time.1
1 1 2016-12-01 08:53:20 2016-12-01 11:50:11
2 1 2016-12-01 12:45:47 <NA>
3 2 2016-12-01 15:34:54 <NA>
4 3 2016-12-01 00:49:50 <NA>
Note, whatever you select from b.time
will be in seconds.