Join two data frames in R based on closest timestamp

前端 未结 2 1524
终归单人心
终归单人心 2020-11-30 05:56

Hi I have two tables (table1 and table2 below) and would like to join them based on the closest timestamp to form expected_output. Some kind of solution involving dplyr wou

相关标签:
2条回答
  • 2020-11-30 06:43

    This is liable to be slow, but...

    d   <- function(x,y) abs(x-y) # define the distance function
    idx <- sapply( table1$date, function(x) which.min( d(x,table2$date) )) # find matches
    
    cbind(table1,table2[idx,-1,drop=FALSE])
    #                  date val1 val2
    # 2 2015-07-16 08:55:00   94   90
    # 3 2015-06-20 00:15:00   33   18
    # 1 2015-05-12 15:00:00   53   67
    

    Another way of constructing idx is max.col(-outer(table1$date, table2$date, d)).

    0 讨论(0)
  • 2020-11-30 06:53

    Using rolling joins feature of data.table with roll = "nearest":

    require(data.table) # v1.9.6+
    setDT(table1)[, val2 := setDT(table2)[table1, val2, on = "date", roll = "nearest"]]
    

    Here, val2 column is created by performing a join on the column date with roll = "nearest" option. For each row of table1$date, the closest matching row from table2$date is computed, and val2 for corresponding row is extracted.

    0 讨论(0)
提交回复
热议问题