Matching timestamped data to closest time in another dataset. Properly vectorized? Faster way?

后端 未结 2 2091
暖寄归人
暖寄归人 2020-12-28 14:53

I have a timestamp in one data frame that I am trying to match to the closest timestamp in a second dataframe, for the purpose of extracting data from the second dataframe.

相关标签:
2条回答
  • 2020-12-28 15:47

    You can try data.tables rolling join using the "nearest" option

    library(data.table) # v1.9.6+
    setDT(reference)[data, refvalue, roll = "nearest", on = "datetime"]
    # [1] 5 7 7 8
    
    0 讨论(0)
  • 2020-12-28 15:49

    I wondered if this would be able to match a data.table solution for speed, but it's a base-R vectorized solution which should outperform your apply version. And since it doesn't actually ever calculate a distance, it might actually be faster than the data.table-nearest approach. This adds the length of the midpoints of the intervals to either the lowest possible value or the starting point of the the intervals to create a set of "mid-breaks" and then uses the findInterval function to process the times. That creates a suitable index into the rows of the reference dataset and the "refvalue" can then be "transferred" to the data-object.

     data$reefvalue <- reference$refvalue[
                          findInterval( data$datetime, 
                                         c(-Inf, head(reference$datetime,-1))+
                                         c(0, diff(as.numeric(reference$datetime))/2 )) ]
     # values are [1] 5 7 7 8
    
    0 讨论(0)
提交回复
热议问题