Find the minimum distance between two data frames, for each element in the second data frame

前端 未结 2 550
野性不改
野性不改 2021-01-01 03:28

I have two data frames ev1 and ev2, describing timestamps of two types of events collected over many tests. So, each data frame has columns \"test_id\", and \"timestamp\". W

相关标签:
2条回答
  • 2021-01-01 03:51

    May be this helps:

    library(data.table)
    setkey(setDT(ev1), test_id)
    DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=time-i.time]
    DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, i.time)]$V1]
    #    test_id time i.time distance
    #1:       0    3      6        3
    #2:       0    1      1        0
    #3:       0    3      8        5
    #4:       1    4      4        0
    #5:       1    4      5        1
    #6:       1    4     11        7
    

    Or

     ev1[ev2, allow.cartesian=TRUE][,distance:= time-i.time][,
          .SD[abs(distance)==min(abs(distance))], by=list(test_id, i.time)]
    

    Update

    Using the new grouping

    setkey(setDT(ev1), test_id, group_id)
    setkey(setDT(ev2), test_id, group_id)
    DT <- ev1[ev2, allow.cartesian=TRUE][,distance:=i.time-time]
    DT[DT[,abs(distance)==min(abs(distance)), by=list(test_id, 
                                    group_id,i.time)]$V1]$distance
    #[1]  2  3  4 -1  0  4
    

    Based on the code you provided

    min_data$distance
    #[1]  2  3  4 -1  0  4
    
    0 讨论(0)
  • 2021-01-01 04:06

    Here's how I'd do it using data.table:

    require(data.table)
    setkey(setDT(ev1), test_id)
    ev1[ev2, .(ev2.time = i.time, ev1.time = time[which.min(abs(i.time - time))]), by = .EACHI]
    #    test_id ev2.time ev1.time
    # 1:       0        6        3
    # 2:       0        1        1
    # 3:       0        8        3
    # 4:       1        4        4
    # 5:       1        5        4
    # 6:       1       11        4
    

    In joins of the form x[i] in data.table, the prefix i. is used to refer the columns in i, when both x and i share the same name for a particular column.

    Please see this SO post for an explanation on how this works.

    This is syntactically more straightforward to understand what's going on, and is memory efficient (at the expense of little speed1) as it doesn't materialise the entire join result at all. In fact, this does exactly what you say in your post - filter on the fly, while merging.

    1. On speed, it doesn't matter in most of the cases really. If there are a lot of rows in i, it might be a tad slower as the j-expression will have to be evaluated for each row in i. In contrast, @akrun's answer does a cartesian join followed by one filtering. So while it's high on memory, it doesn't evaluate j for each row in i. But again, this shouldn't even matter unless you work with really large i which is not often the case.

    HTH

    0 讨论(0)
提交回复
热议问题