Fast check for NaN in NumPy

后端 未结 7 1874
时光说笑
时光说笑 2021-01-30 02:39

I\'m looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since

7条回答
  •  醉酒成梦
    2021-01-30 03:35

    There are two general approaches here:

    • Check each array item for nan and take any.
    • Apply some cumulative operation that preserves nans (like sum) and check its result.

    While the first approach is certainly the cleanest, the heavy optimization of some of the cumulative operations (particularly the ones that are executed in BLAS, like dot) can make those quite fast. Note that dot, like some other BLAS operations, are multithreaded under certain conditions. This explains the difference in speed between different machines.

    import numpy
    import perfplot
    
    
    def min(a):
        return numpy.isnan(numpy.min(a))
    
    
    def sum(a):
        return numpy.isnan(numpy.sum(a))
    
    
    def dot(a):
        return numpy.isnan(numpy.dot(a, a))
    
    
    def any(a):
        return numpy.any(numpy.isnan(a))
    
    
    def einsum(a):
        return numpy.isnan(numpy.einsum("i->", a))
    
    
    perfplot.show(
        setup=lambda n: numpy.random.rand(n),
        kernels=[min, sum, dot, any, einsum],
        n_range=[2 ** k for k in range(20)],
        logx=True,
        logy=True,
        xlabel="len(a)",
    )
    

提交回复
热议问题