Find the minimum distance between two data frames, for each element in the second data frame

前端未结

关注

 2  574

I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns \"test_id\", and \"timestamp\". W

相关标签:

2条回答

抹茶落季

2021-01-01 03:51

May be this helps:

library(data.table)
setkey(setDT(ev1), test_id)
DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=time-i.time]
DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, i.time)]$V1]
#    test_id time i.time distance
#1:       0    3      6        3
#2:       0    1      1        0
#3:       0    3      8        5
#4:       1    4      4        0
#5:       1    4      5        1
#6:       1    4     11        7

 ev1[ev2, allow.cartesian=TRUE][,distance:= time-i.time][,
      .SD[abs(distance)==min(abs(distance))], by=list(test_id, i.time)]

Update

Using the new grouping

setkey(setDT(ev1), test_id, group_id)
setkey(setDT(ev2), test_id, group_id)
DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=i.time-time]
DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, 
                                group_id,i.time)]$V1]$distance
#[1]  2  3  4 -1  0  4

Based on the code you provided

min_data$distance
#[1]  2  3  4 -1  0  4

0 讨论(0)

爱一瞬间的悲伤

2021-01-01 04:06
Here's how I'd do it using data.table:
```
require(data.table)
setkey(setDT(ev1), test_id)
ev1[ev2, .(ev2.time = i.time, ev1.time = time[which.min(abs(i.time - time))]), by = .EACHI]
#    test_id ev2.time ev1.time
# 1:       0        6        3
# 2:       0        1        1
# 3:       0        8        3
# 4:       1        4        4
# 5:       1        5        4
# 6:       1       11        4
```
In joins of the form x[i] in data.table, the prefix i. is used to refer the columns in i, when both x and i share the same name for a particular column.

Please see this SO post for an explanation on how this works.

This is syntactically more straightforward to understand what's going on, and is memory efficient (at the expense of little speed¹) as it doesn't materialise the entire join result at all. In fact, this does exactly what you say in your post - filter on the fly, while merging.
1. On speed, it doesn't matter in most of the cases really. If there are a lot of rows in i, it might be a tad slower as the j-expression will have to be evaluated for each row in i. In contrast, @akrun's answer does a cartesian join followed by one filtering. So while it's high on memory, it doesn't evaluate j for each row in i. But again, this shouldn't even matter unless you work with really large i which is not often the case.
HTH
0 讨论(0)
发布评论:

提交评论
- 加载中...