Quickest way to find closest elements in an array in R

前端 未结 3 1472
青春惊慌失措
青春惊慌失措 2021-01-29 00:21

I would like find the fastes way in R to indentify indexes of elements in Ytimes array which are closest to given Xtimes values.

So far I have been using a simple for-lo

3条回答
  •  一整个雨季
    2021-01-29 01:16

    We can use findInterval to do this efficiently. (cut will also work, with a little more work).

    First, let's offset the Ytimes offsets so that we can find the nearest and not the next-lesser. I'll demonstrate on fake data first:

    y <- c(1,3,5,10,20)
    y2 <- c(-Inf, y + c(diff(y)/2, Inf))
    cbind(y, y2[-1])
    #       y     
    # [1,]  1  2.0
    # [2,]  3  4.0
    # [3,]  5  7.5
    # [4,] 10 15.0
    # [5,] 20  Inf
    findInterval(c(1, 1.9, 2.1, 8), y2)
    # [1] 1 1 2 4
    

    The second column (prepended with a -Inf will give us the breaks. Notice that each is half-way between the corresponding value and its follower.

    Okay, let's apply this to your vectors:

    Y2 <- Ytimes + c(diff(Ytimes)/2, Inf)
    head(cbind(Ytimes, Y2))
    #         Ytimes         Y2
    # [1,] 0.0000000 0.06006006
    # [2,] 0.1201201 0.18018018
    # [3,] 0.2402402 0.30030030
    # [4,] 0.3603604 0.42042042
    # [5,] 0.4804805 0.54054054
    # [6,] 0.6006006 0.66066066
    
    Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
    cbind(Xtimes, Y2[ findInterval(Xtimes, Y2) ])
    #       Xtimes            
    #  [1,]      1   0.9009009
    #  [2,]      5   4.9849850
    #  [3,]      8   7.9879880
    #  [4,]     10   9.9099099
    #  [5,]     15  14.9549550
    #  [6,]     19  18.9189189
    #  [7,]     23  22.8828829
    #  [8,]     34  33.9339339
    #  [9,]     45  44.9849850
    # [10,]     51  50.9909910
    # [11,]     55  54.9549550
    # [12,]     57  56.9969970
    # [13,]     78  77.8978979
    # [14,]    120 119.9399399
    

    (I'm using cbind just for side-by-side demonstration, not that it's necessary.)

    Benchmark:

    mbm <- microbenchmark::microbenchmark(
      for_loop = {
        YmatchIndex <- array(0,length(Xtimes))
        for (i in 1:length(Xtimes)) {
          YmatchIndex[i] = which.min(abs(Ytimes - Xtimes[i]))
        }
      },
      apply    = sapply(Xtimes, function(x){which.min(abs(Ytimes - x))}),
      fndIntvl = {
        Y2 <- c(-Inf, Ytimes + c(diff(Ytimes)/2, Inf))
        Ytimes[ findInterval(Xtimes, Y2) ]
      },
      times = 100
    )
    mbm
    # Unit: microseconds
    #      expr    min     lq     mean  median      uq    max neval
    #  for_loop 2210.5 2346.8 2823.678 2444.80 3029.45 7800.7   100
    #     apply   48.8   58.7  100.455   65.55   91.50 2568.7   100
    #  fndIntvl   18.3   23.4   34.059   29.80   40.30   83.4   100
    ggplot2::autoplot(mbm)
    

提交回复
热议问题