问题
I have two data frames, logger and df (times are numeric):
logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)
df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413),
temp = NA
)
I would like to search logger$time for the closest match to each row in df$time, and assign the associated logger$temp to df$temp. So far, I have been successful using the following loop:
for (i in 1:length(df$time)){
closestto<-which.min(abs((logger$time) - (df$time[i])))
df$temp[i]<-logger$temp[closestto]
}
However, I now have large data frames (logger has 13,620 rows and df has 266138) and processing times are long. I've read that loops are not the most efficient way to do things, but I am unfamiliar with alternatives. Is there a faster way to do this?
回答1:
I'd use data.table
for this. It makes it super easy and super fast joining on keys
. There is even a really helpful roll = "nearest"
argument for exactly the behaviour you are looking for (except in your example data it is not necessary because all times
from df
appear in logger
). In the following example I renamed df$time
to df$time1
to make it clear which column belongs to which table...
# Load package
require( data.table )
# Make data.frames into data.tables with a key column
ldt <- data.table( logger , key = "time" )
dt <- data.table( df , key = "time1" )
# Join based on the key column of the two tables (time & time1)
# roll = "nearest" gives the desired behaviour
# list( obs , time1 , temp ) gives the columns you want to return from dt
ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ]
# time obs time1 temp
# 1: 1280248361 8 1280248361 18.07644
# 2: 1280248366 4 1280248366 21.88957
# 3: 1280248370 3 1280248370 19.09015
# 4: 1280248376 5 1280248376 22.39770
# 5: 1280248381 6 1280248381 24.12758
# 6: 1280248383 10 1280248383 22.70919
# 7: 1280248385 1 1280248385 18.78183
# 8: 1280248389 2 1280248389 18.17874
# 9: 1280248393 9 1280248393 18.03098
#10: 1280248403 7 1280248403 22.74372
回答2:
You could use the data.table
library. This will also help with being more efficient with large data size -
library(data.table)
logger <- data.frame(
time = c(1280248354:1280248413),
temp = runif(60,min=18,max=24.5)
)
df <- data.frame(
obs = c(1:10),
time = runif(10,min=1280248354,max=1280248413)
)
logger <- data.table(logger)
df <- data.table(df)
setkey(df,time)
setkey(logger,time)
df2 <- logger[df, roll = "nearest"]
Output -
> df2
time temp obs
1: 1280248356 22.81437 7
2: 1280248360 24.08711 10
3: 1280248366 22.31738 2
4: 1280248367 18.61222 5
5: 1280248388 19.46300 4
6: 1280248393 18.26535 6
7: 1280248400 20.61901 9
8: 1280248402 21.92584 1
9: 1280248410 19.36526 8
10: 1280248410 19.36526 3
来源:https://stackoverflow.com/questions/19957725/r-assign-column-value-based-on-closest-match-in-second-data-frame