I\'m looking for the fastest way to check for the occurrence of NaN (np.nan
) in a NumPy array X
. np.isnan(X)
is out of the question, since
There are two general approaches here:
nan
and take any
.nan
s (like sum
) and check its result.While the first approach is certainly the cleanest, the heavy optimization of some of the cumulative operations (particularly the ones that are executed in BLAS, like dot
) can make those quite fast. Note that dot
, like some other BLAS operations, are multithreaded under certain conditions. This explains the difference in speed between different machines.
import numpy
import perfplot
def min(a):
return numpy.isnan(numpy.min(a))
def sum(a):
return numpy.isnan(numpy.sum(a))
def dot(a):
return numpy.isnan(numpy.dot(a, a))
def any(a):
return numpy.any(numpy.isnan(a))
def einsum(a):
return numpy.isnan(numpy.einsum("i->", a))
perfplot.show(
setup=lambda n: numpy.random.rand(n),
kernels=[min, sum, dot, any, einsum],
n_range=[2 ** k for k in range(20)],
logx=True,
logy=True,
xlabel="len(a)",
)