I have two dataframes like so:
set.seed(1)
df <- cbind(expand.grid(x=1:3, y=1:5), time=round(runif(15)*30))
to.merge <- data.frame(x=c(2, 2, 2, 3, 2),
Using merge
couple of times and aggregate
once, here is how to do it.
set.seed(1)
df <- cbind(expand.grid(x = 1:3, y = 1:5), time = round(runif(15) * 30))
to.merge <- data.frame(x = c(2, 2, 2, 3, 2), y = c(1, 1, 1, 5, 4), time = c(17, 12, 11.6, 22.5, 2), val = letters[1:5], stringsAsFactors = F)
#Find rows that match by x and y
res <- merge(to.merge, df, by = c("x", "y"), all.x = TRUE)
res$dif <- abs(res$time.x - res$time.y)
res
## x y time.x val time.y dif
## 1 2 1 17.0 a 11 6.0
## 2 2 1 12.0 b 11 1.0
## 3 2 1 11.6 c 11 0.6
## 4 2 4 2.0 e 6 4.0
## 5 3 5 22.5 d 23 0.5
#Find rows that need to be merged
res1 <- merge(aggregate(dif ~ x + y, data = res, FUN = min), res)
res1
## x y dif time.x val time.y
## 1 2 1 0.6 11.6 c 11
## 2 2 4 4.0 2.0 e 6
## 3 3 5 0.5 22.5 d 23
#Finally merge the result back into df
final <- merge(df, res1[res1$dif <= 1, c("x", "y", "val")], all.x = TRUE)
final
## x y time val
## 1 1 1 8 <NA>
## 2 1 2 27 <NA>
## 3 1 3 28 <NA>
## 4 1 4 2 <NA>
## 5 1 5 21 <NA>
## 6 2 1 11 c
## 7 2 2 6 <NA>
## 8 2 3 20 <NA>
## 9 2 4 6 <NA>
## 10 2 5 12 <NA>
## 11 3 1 17 <NA>
## 12 3 2 27 <NA>
## 13 3 3 19 <NA>
## 14 3 4 5 <NA>
## 15 3 5 23 d
mnel's answer uses roll = "nearest"
in a data.table
join but does not limit to +/- 1 as requested by the OP. In addition, MichaelChirico has suggested to use the on
parameter.
This approach uses
roll = "nearest"
,setDT()
to coerce a data.frame to data.table
without copying (introduced 2014-02-27 with v.1.9.2 of data.table
),on
parameter which spares to set a key explicitely (introduced 2015-09-19 with v.1.9.6).So, the code below
library(data.table) # version 1.11.4 used
setDT(df)[setDT(to.merge), on = .(x, y, time), roll = "nearest",
val := replace(val, abs(x.time - i.time) > 1, NA)]
df
has updated df
:
x y time val 1: 1 1 8 <NA> 2: 2 1 11 c 3: 3 1 17 <NA> 4: 1 2 27 <NA> 5: 2 2 6 <NA> 6: 3 2 27 <NA> 7: 1 3 28 <NA> 8: 2 3 20 <NA> 9: 3 3 19 <NA> 10: 1 4 2 <NA> 11: 2 4 6 <NA> 12: 3 4 5 <NA> 13: 1 5 21 <NA> 14: 2 5 12 <NA> 15: 3 5 23 d
Note that the order of rows has not been changed (in contrast to Chinmay Patil's answer)
In case df
must not be changed, a new data.table can be created by
result <- setDT(to.merge)[setDT(df), on = .(x, y, time), roll = "nearest",
.(x, y, time, val = replace(val, abs(x.time - i.time) > 1, NA))]
result
which returns the same result as above.
Use data.table
and roll='nearest'
or to limit to 1, roll = 1, rollends = c(TRUE,TRUE)
eg
library(data.table)
# create data.tables with the same key columns (x, y, time)
DT <- data.table(df, key = names(df))
tm <- data.table(to.merge, key = key(DT))
# use join syntax with roll = 'nearest'
tm[DT, roll='nearest']
# x y time val
# 1: 1 1 8 NA
# 2: 1 2 27 NA
# 3: 1 3 28 NA
# 4: 1 4 2 NA
# 5: 1 5 21 NA
# 6: 2 1 11 c
# 7: 2 2 6 NA
# 8: 2 3 20 NA
# 9: 2 4 6 e
# 10: 2 5 12 NA
# 11: 3 1 17 NA
# 12: 3 2 27 NA
# 13: 3 3 19 NA
# 14: 3 4 5 NA
# 15: 3 5 23 d
You can limit your self to looking forward and back (1) by setting roll=-1
and rollends = c(TRUE,TRUE)
new <- tm[DT, roll=-1, rollends =c(TRUE,TRUE)]
new
x y time val
1: 1 1 8 NA
2: 1 2 27 NA
3: 1 3 28 NA
4: 1 4 2 NA
5: 1 5 21 NA
6: 2 1 11 c
7: 2 2 6 NA
8: 2 3 20 NA
9: 2 4 6 NA
10: 2 5 12 NA
11: 3 1 17 NA
12: 3 2 27 NA
13: 3 3 19 NA
14: 3 4 5 NA
15: 3 5 23 d
Or you can roll=1 first, then roll=-1, then combine the results (tidying up the val.1 column from the second rolling join)
new <- tm[DT, roll = 1][tm[DT,roll=-1]][is.na(val), val := ifelse(is.na(val.1),val,val.1)][,val.1 := NULL]
new
x y time val
1: 1 1 8 NA
2: 1 2 27 NA
3: 1 3 28 NA
4: 1 4 2 NA
5: 1 5 21 NA
6: 2 1 11 c
7: 2 2 6 NA
8: 2 3 20 NA
9: 2 4 6 NA
10: 2 5 12 NA
11: 3 1 17 NA
12: 3 2 27 NA
13: 3 3 19 NA
14: 3 4 5 NA
15: 3 5 23 d