R - Assign column value based on closest match in second data frame

戏子无情 提交于 2019-12-19 03:07:14

问题


I have two data frames, logger and df (times are numeric):

logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)

df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413),
temp = NA
)

I would like to search logger$time for the closest match to each row in df$time, and assign the associated logger$temp to df$temp. So far, I have been successful using the following loop:

for (i in 1:length(df$time)){
closestto<-which.min(abs((logger$time) - (df$time[i])))
df$temp[i]<-logger$temp[closestto]
}

However, I now have large data frames (logger has 13,620 rows and df has 266138) and processing times are long. I've read that loops are not the most efficient way to do things, but I am unfamiliar with alternatives. Is there a faster way to do this?


回答1:


I'd use data.table for this. It makes it super easy and super fast joining on keys. There is even a really helpful roll = "nearest" argument for exactly the behaviour you are looking for (except in your example data it is not necessary because all times from df appear in logger). In the following example I renamed df$time to df$time1 to make it clear which column belongs to which table...

#  Load package
require( data.table )

#  Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )

#  Join based on the key column of the two tables (time & time1)
#  roll = "nearest" gives the desired behaviour
#  list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
#          time obs      time1     temp
# 1: 1280248361   8 1280248361 18.07644
# 2: 1280248366   4 1280248366 21.88957
# 3: 1280248370   3 1280248370 19.09015
# 4: 1280248376   5 1280248376 22.39770
# 5: 1280248381   6 1280248381 24.12758
# 6: 1280248383  10 1280248383 22.70919
# 7: 1280248385   1 1280248385 18.78183
# 8: 1280248389   2 1280248389 18.17874
# 9: 1280248393   9 1280248393 18.03098
#10: 1280248403   7 1280248403 22.74372



回答2:


You could use the data.table library. This will also help with being more efficient with large data size -

library(data.table)

logger <- data.frame(
  time = c(1280248354:1280248413),
  temp = runif(60,min=18,max=24.5)
)

df <- data.frame(
  obs = c(1:10),
  time = runif(10,min=1280248354,max=1280248413)
)

logger <- data.table(logger)
df <- data.table(df)

setkey(df,time)
setkey(logger,time)

df2 <- logger[df, roll = "nearest"]

Output -

> df2
          time     temp obs
 1: 1280248356 22.81437   7
 2: 1280248360 24.08711  10
 3: 1280248366 22.31738   2
 4: 1280248367 18.61222   5
 5: 1280248388 19.46300   4
 6: 1280248393 18.26535   6
 7: 1280248400 20.61901   9
 8: 1280248402 21.92584   1
 9: 1280248410 19.36526   8
10: 1280248410 19.36526   3


来源:https://stackoverflow.com/questions/19957725/r-assign-column-value-based-on-closest-match-in-second-data-frame

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!