Fast check for NaN in NumPy

后端 未结 7 1871
时光说笑
时光说笑 2021-01-30 02:39

I\'m looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since

7条回答
  •  野趣味
    野趣味 (楼主)
    2021-01-30 03:25

    If you're comfortable with numba it allows to create a fast short-circuit (stops as soon as a NaN is found) function:

    import numba as nb
    import math
    
    @nb.njit
    def anynan(array):
        array = array.ravel()
        for i in range(array.size):
            if math.isnan(array[i]):
                return True
        return False
    

    If there is no NaN the function might actually be slower than np.min, I think that's because np.min uses multiprocessing for large arrays:

    import numpy as np
    array = np.random.random(2000000)
    
    %timeit anynan(array)          # 100 loops, best of 3: 2.21 ms per loop
    %timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.45 ms per loop
    %timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.64 ms per loop
    

    But in case there is a NaN in the array, especially if it's position is at low indices, then it's much faster:

    array = np.random.random(2000000)
    array[100] = np.nan
    
    %timeit anynan(array)          # 1000000 loops, best of 3: 1.93 µs per loop
    %timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.57 ms per loop
    %timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.65 ms per loop
    

    Similar results may be achieved with Cython or a C extension, these are a bit more complicated (or easily avaiable as bottleneck.anynan) but ultimatly do the same as my anynan function.

提交回复
热议问题