R - Assign column value based on closest match in second data frame

后端 未结 2 1087
暖寄归人
暖寄归人 2021-01-05 17:03

I have two data frames, logger and df (times are numeric):

logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)

df &         


        
相关标签:
2条回答
  • 2021-01-05 17:14

    I'd use data.table for this. It makes it super easy and super fast joining on keys. There is even a really helpful roll = "nearest" argument for exactly the behaviour you are looking for (except in your example data it is not necessary because all times from df appear in logger). In the following example I renamed df$time to df$time1 to make it clear which column belongs to which table...

    #  Load package
    require( data.table )
    
    #  Make data.frames into data.tables with a key column
    ldt <- data.table( logger , key = "time" )
    dt <- data.table( df , key = "time1" )
    
    #  Join based on the key column of the two tables (time & time1)
    #  roll = "nearest" gives the desired behaviour
    #  list( obs , time1 , temp ) gives the columns you want to return from dt
    ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
    #          time obs      time1     temp
    # 1: 1280248361   8 1280248361 18.07644
    # 2: 1280248366   4 1280248366 21.88957
    # 3: 1280248370   3 1280248370 19.09015
    # 4: 1280248376   5 1280248376 22.39770
    # 5: 1280248381   6 1280248381 24.12758
    # 6: 1280248383  10 1280248383 22.70919
    # 7: 1280248385   1 1280248385 18.78183
    # 8: 1280248389   2 1280248389 18.17874
    # 9: 1280248393   9 1280248393 18.03098
    #10: 1280248403   7 1280248403 22.74372
    
    0 讨论(0)
  • 2021-01-05 17:25

    You could use the data.table library. This will also help with being more efficient with large data size -

    library(data.table)
    
    logger <- data.frame(
      time = c(1280248354:1280248413),
      temp = runif(60,min=18,max=24.5)
    )
    
    df <- data.frame(
      obs = c(1:10),
      time = runif(10,min=1280248354,max=1280248413)
    )
    
    logger <- data.table(logger)
    df <- data.table(df)
    
    setkey(df,time)
    setkey(logger,time)
    
    df2 <- logger[df, roll = "nearest"]
    

    Output -

    > df2
              time     temp obs
     1: 1280248356 22.81437   7
     2: 1280248360 24.08711  10
     3: 1280248366 22.31738   2
     4: 1280248367 18.61222   5
     5: 1280248388 19.46300   4
     6: 1280248393 18.26535   6
     7: 1280248400 20.61901   9
     8: 1280248402 21.92584   1
     9: 1280248410 19.36526   8
    10: 1280248410 19.36526   3
    
    0 讨论(0)
提交回复
热议问题