Find nearest value in numpy array

后端 未结 16 1606
被撕碎了的回忆
被撕碎了的回忆 2020-11-22 10:18

Is there a numpy-thonic way, e.g. function, to find the nearest value in an array?

Example:

np.find_nearest( array, value )
相关标签:
16条回答
  • 2020-11-22 10:39

    For large arrays, the (excellent) answer given by @Demitri is far faster than the answer currently marked as best. I've adapted his exact algorithm in the following two ways:

    1. The function below works whether or not the input array is sorted.

    2. The function below returns the index of the input array corresponding to the closest value, which is somewhat more general.

    Note that the function below also handles a specific edge case that would lead to a bug in the original function written by @Demitri. Otherwise, my algorithm is identical to his.

    def find_idx_nearest_val(array, value):
        idx_sorted = np.argsort(array)
        sorted_array = np.array(array[idx_sorted])
        idx = np.searchsorted(sorted_array, value, side="left")
        if idx >= len(array):
            idx_nearest = idx_sorted[len(array)-1]
        elif idx == 0:
            idx_nearest = idx_sorted[0]
        else:
            if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
                idx_nearest = idx_sorted[idx-1]
            else:
                idx_nearest = idx_sorted[idx]
        return idx_nearest
    
    0 讨论(0)
  • 2020-11-22 10:40

    Summary of answer: If one has a sorted array then the bisection code (given below) performs the fastest. ~100-1000 times faster for large arrays, and ~2-100 times faster for small arrays. It does not require numpy either. If you have an unsorted array then if array is large, one should consider first using an O(n logn) sort and then bisection, and if array is small then method 2 seems the fastest.

    First you should clarify what you mean by nearest value. Often one wants the interval in an abscissa, e.g. array=[0,0.7,2.1], value=1.95, answer would be idx=1. This is the case that I suspect you need (otherwise the following can be modified very easily with a followup conditional statement once you find the interval). I will note that the optimal way to perform this is with bisection (which I will provide first - note it does not require numpy at all and is faster than using numpy functions because they perform redundant operations). Then I will provide a timing comparison against the others presented here by other users.

    Bisection:

    def bisection(array,value):
        '''Given an ``array`` , and given a ``value`` , returns an index j such that ``value`` is between array[j]
        and array[j+1]. ``array`` must be monotonic increasing. j=-1 or j=len(array) is returned
        to indicate that ``value`` is out of range below and above respectively.'''
        n = len(array)
        if (value < array[0]):
            return -1
        elif (value > array[n-1]):
            return n
        jl = 0# Initialize lower
        ju = n-1# and upper limits.
        while (ju-jl > 1):# If we are not yet done,
            jm=(ju+jl) >> 1# compute a midpoint with a bitshift
            if (value >= array[jm]):
                jl=jm# and replace either the lower limit
            else:
                ju=jm# or the upper limit, as appropriate.
            # Repeat until the test condition is satisfied.
        if (value == array[0]):# edge cases at bottom
            return 0
        elif (value == array[n-1]):# and top
            return n-1
        else:
            return jl
    

    Now I'll define the code from the other answers, they each return an index:

    import math
    import numpy as np
    
    def find_nearest1(array,value):
        idx,val = min(enumerate(array), key=lambda x: abs(x[1]-value))
        return idx
    
    def find_nearest2(array, values):
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        return indices
    
    def find_nearest3(array, values):
        values = np.atleast_1d(values)
        indices = np.abs(np.int64(np.subtract.outer(array, values))).argmin(0)
        out = array[indices]
        return indices
    
    def find_nearest4(array,value):
        idx = (np.abs(array-value)).argmin()
        return idx
    
    
    def find_nearest5(array, value):
        idx_sorted = np.argsort(array)
        sorted_array = np.array(array[idx_sorted])
        idx = np.searchsorted(sorted_array, value, side="left")
        if idx >= len(array):
            idx_nearest = idx_sorted[len(array)-1]
        elif idx == 0:
            idx_nearest = idx_sorted[0]
        else:
            if abs(value - sorted_array[idx-1]) < abs(value - sorted_array[idx]):
                idx_nearest = idx_sorted[idx-1]
            else:
                idx_nearest = idx_sorted[idx]
        return idx_nearest
    
    def find_nearest6(array,value):
        xi = np.argmin(np.abs(np.ceil(array[None].T - value)),axis=0)
        return xi
    

    Now I'll time the codes: Note methods 1,2,4,5 don't correctly give the interval. Methods 1,2,4 round to nearest point in array (e.g. >=1.5 -> 2), and method 5 always rounds up (e.g. 1.45 -> 2). Only methods 3, and 6, and of course bisection give the interval properly.

    array = np.arange(100000)
    val = array[50000]+0.55
    print( bisection(array,val))
    %timeit bisection(array,val)
    print( find_nearest1(array,val))
    %timeit find_nearest1(array,val)
    print( find_nearest2(array,val))
    %timeit find_nearest2(array,val)
    print( find_nearest3(array,val))
    %timeit find_nearest3(array,val)
    print( find_nearest4(array,val))
    %timeit find_nearest4(array,val)
    print( find_nearest5(array,val))
    %timeit find_nearest5(array,val)
    print( find_nearest6(array,val))
    %timeit find_nearest6(array,val)
    
    (50000, 50000)
    100000 loops, best of 3: 4.4 µs per loop
    50001
    1 loop, best of 3: 180 ms per loop
    50001
    1000 loops, best of 3: 267 µs per loop
    [50000]
    1000 loops, best of 3: 390 µs per loop
    50001
    1000 loops, best of 3: 259 µs per loop
    50001
    1000 loops, best of 3: 1.21 ms per loop
    [50000]
    1000 loops, best of 3: 746 µs per loop
    

    For a large array bisection gives 4us compared to next best 180us and longest 1.21ms (~100 - 1000 times faster). For smaller arrays it's ~2-100 times faster.

    0 讨论(0)
  • 2020-11-22 10:40

    Here's an extension to find the nearest vector in an array of vectors.

    import numpy as np
    
    def find_nearest_vector(array, value):
      idx = np.array([np.linalg.norm(x+y) for (x,y) in array-value]).argmin()
      return array[idx]
    
    A = np.random.random((10,2))*100
    """ A = array([[ 34.19762933,  43.14534123],
       [ 48.79558706,  47.79243283],
       [ 38.42774411,  84.87155478],
       [ 63.64371943,  50.7722317 ],
       [ 73.56362857,  27.87895698],
       [ 96.67790593,  77.76150486],
       [ 68.86202147,  21.38735169],
       [  5.21796467,  59.17051276],
       [ 82.92389467,  99.90387851],
       [  6.76626539,  30.50661753]])"""
    pt = [6, 30]  
    print find_nearest_vector(A,pt)
    # array([  6.76626539,  30.50661753])
    
    0 讨论(0)
  • 2020-11-22 10:41

    Here's a version that will handle a non-scalar "values" array:

    import numpy as np
    
    def find_nearest(array, values):
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        return array[indices]
    

    Or a version that returns a numeric type (e.g. int, float) if the input is scalar:

    def find_nearest(array, values):
        values = np.atleast_1d(values)
        indices = np.abs(np.subtract.outer(array, values)).argmin(0)
        out = array[indices]
        return out if len(out) > 1 else out[0]
    
    0 讨论(0)
  • 2020-11-22 10:41

    For 2d array, to determine the i, j position of nearest element:

    import numpy as np
    def find_nearest(a, a0):
        idx = (np.abs(a - a0)).argmin()
        w = a.shape[1]
        i = idx // w
        j = idx - i * w
        return a[i,j], i, j
    
    0 讨论(0)
  • 2020-11-22 10:42

    All the answers are beneficial to gather the information to write efficient code. However, I have written a small Python script to optimize for various cases. It will be the best case if the provided array is sorted. If one searches the index of the nearest point of a specified value, then bisect module is the most time efficient. When one search the indices correspond to an array, the numpy searchsorted is most efficient.

    import numpy as np
    import bisect
    xarr = np.random.rand(int(1e7))
    
    srt_ind = xarr.argsort()
    xar = xarr.copy()[srt_ind]
    xlist = xar.tolist()
    bisect.bisect_left(xlist, 0.3)
    

    In [63]: %time bisect.bisect_left(xlist, 0.3) CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 22.2 µs

    np.searchsorted(xar, 0.3, side="left")
    

    In [64]: %time np.searchsorted(xar, 0.3, side="left") CPU times: user 0 ns, sys: 0 ns, total: 0 ns Wall time: 98.9 µs

    randpts = np.random.rand(1000)
    np.searchsorted(xar, randpts, side="left")
    

    %time np.searchsorted(xar, randpts, side="left") CPU times: user 4 ms, sys: 0 ns, total: 4 ms Wall time: 1.2 ms

    If we follow the multiplicative rule, then numpy should take ~100 ms which implies ~83X faster.

    0 讨论(0)
提交回复
热议问题