Detect if a NumPy array contains at least one non-numeric value?

后端 未结 5 1421
逝去的感伤
逝去的感伤 2021-01-30 07:46

I need to write a function which will detect if the input contains at least one value which is non-numeric. If a non-numeric value is found I will raise an error (because the ca

相关标签:
5条回答
  • 2021-01-30 08:21

    (np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element of nan, A could be an n x m matrix.

    Example:

    import numpy as np
    
    A = np.array([1,2,4,np.nan])
    
    if (np.where(np.isnan(A)))[0].shape[0]: 
        print "A contains nan"
    else:
        print "A does not contain nan"
    
    0 讨论(0)
  • 2021-01-30 08:29

    With numpy 1.3 or svn you can do this

    In [1]: a = arange(10000.).reshape(100,100)
    
    In [3]: isnan(a.max())
    Out[3]: False
    
    In [4]: a[50,50] = nan
    
    In [5]: isnan(a.max())
    Out[5]: True
    
    In [6]: timeit isnan(a.max())
    10000 loops, best of 3: 66.3 µs per loop
    

    The treatment of nans in comparisons was not consistent in earlier versions.

    0 讨论(0)
  • 2021-01-30 08:34

    This should be faster than iterating and will work regardless of shape.

    numpy.isnan(myarray).any()
    

    Edit: 30x faster:

    import timeit
    s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan'
    ms = [
        'numpy.isnan(a).any()',
        'any(numpy.isnan(x) for x in a.flatten())']
    for m in ms:
        print "  %.2f s" % timeit.Timer(m, s).timeit(1000), m
    

    Results:

      0.11 s numpy.isnan(a).any()
      3.75 s any(numpy.isnan(x) for x in a.flatten())
    

    Bonus: it works fine for non-array NumPy types:

    >>> a = numpy.float64(42.)
    >>> numpy.isnan(a).any()
    False
    >>> a = numpy.float64(numpy.nan)
    >>> numpy.isnan(a).any()
    True
    
    0 讨论(0)
  • 2021-01-30 08:37

    Pfft! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

    Note that the accepted answer:

    • iterates over the whole data, regardless of whether a nan is found
    • creates a temporary array of size N, which is redundant.

    A better solution is to return True immediately when NAN is found:

    import numba
    import numpy as np
    
    NAN = float("nan")
    
    @numba.njit(nogil=True)
    def _any_nans(a):
        for x in a:
            if np.isnan(x): return True
        return False
    
    @numba.jit
    def any_nans(a):
        if not a.dtype.kind=='f': return False
        return _any_nans(a.flat)
    
    array1M = np.random.rand(1000000)
    assert any_nans(array1M)==False
    %timeit any_nans(array1M)  # 573us
    
    array1M[0] = NAN
    assert any_nans(array1M)==True
    %timeit any_nans(array1M)  # 774ns  (!nanoseconds)
    

    and works for n-dimensions:

    array1M_nd = array1M.reshape((len(array1M)/2, 2))
    assert any_nans(array1M_nd)==True
    %timeit any_nans(array1M_nd)  # 774ns
    

    Compare this to the numpy native solution:

    def any_nans(a):
        if not a.dtype.kind=='f': return False
        return np.isnan(a).any()
    
    array1M = np.random.rand(1000000)
    assert any_nans(array1M)==False
    %timeit any_nans(array1M)  # 456us
    
    array1M[0] = NAN
    assert any_nans(array1M)==True
    %timeit any_nans(array1M)  # 470us
    
    %timeit np.isnan(array1M).any()  # 532us
    

    The early-exit method is 3 orders or magnitude speedup (in some cases). Not too shabby for a simple annotation.

    0 讨论(0)
  • 2021-01-30 08:38

    If infinity is a possible value, I would use numpy.isfinite

    numpy.isfinite(myarray).all()
    

    If the above evaluates to True, then myarray contains no, numpy.nan, numpy.inf or -numpy.inf values.

    numpy.nan will be OK with numpy.inf values, for example:

    In [11]: import numpy as np
    
    In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]])
    
    In [13]: np.isnan(b)
    Out[13]: 
    array([[False, False],
           [ True, False]], dtype=bool)
    
    In [14]: np.isfinite(b)
    Out[14]: 
    array([[ True, False],
           [False, False]], dtype=bool)
    
    0 讨论(0)
提交回复
热议问题