comparing numpy arrays containing NaN

后端 未结 7 1525
情话喂你
情话喂你 2020-11-28 09:27

For my unittest, I want to check if two arrays are identical. Reduced example:

a = np.array([1, 2, np.NaN])
b = np.array([1, 2, np.NaN])
if np.all(a==b):
          


        
相关标签:
7条回答
  • 2020-11-28 10:02

    You could use numpy masked arrays, mask the NaN values and then use numpy.ma.all or numpy.ma.allclose:

    http://docs.scipy.org/doc/numpy/reference/generated/numpy.ma.all.html

    http://docs.scipy.org/doc/numpy/reference/generated/numpy.ma.allclose.html

    For example:

    a=np.array([1, 2, np.NaN])
    b=np.array([1, 2, np.NaN])
    np.ma.all(np.ma.masked_invalid(a) == np.ma.masked_invalid(b)) #True
    
    0 讨论(0)
  • 2020-11-28 10:09

    If you do this for things like unit tests, so you don't care much about performance and "correct" behaviour with all types, you can use this to have something that works with all types of arrays, not just numeric:

    a = np.array(['a', 'b', None])
    b = np.array(['a', 'b', None])
    assert list(a) == list(b)
    

    Casting ndarrays to lists can sometimes be useful to get the behaviour you want in some test. (But don't use this in production code, or with larger arrays!)

    0 讨论(0)
  • 2020-11-28 10:14

    When I used the above answer:

     ((a == b) | (numpy.isnan(a) & numpy.isnan(b))).all()
    

    It gave me some erros when evaluate list of strings.

    This is more type generic:

    def EQUAL(a,b):
        return ((a == b) | ((a != a) & (b != b)))
    
    0 讨论(0)
  • 2020-11-28 10:20

    The easiest way is use numpy.allclose() method, which allow to specify the behaviour when having nan values. Then your example will look like the following:

    a = np.array([1, 2, np.nan])
    b = np.array([1, 2, np.nan])
    
    if np.allclose(a, b, equal_nan=True):
        print 'arrays are equal'
    

    Then arrays are equal will be printed.

    You can find here the related documentation

    0 讨论(0)
  • 2020-11-28 10:21

    Alternatively you can use numpy.testing.assert_equal or numpy.testing.assert_array_equal with a try/except:

    In : import numpy as np
    
    In : def nan_equal(a,b):
    ...:     try:
    ...:         np.testing.assert_equal(a,b)
    ...:     except AssertionError:
    ...:         return False
    ...:     return True
    
    In : a=np.array([1, 2, np.NaN])
    
    In : b=np.array([1, 2, np.NaN])
    
    In : nan_equal(a,b)
    Out: True
    
    In : a=np.array([1, 2, np.NaN])
    
    In : b=np.array([3, 2, np.NaN])
    
    In : nan_equal(a,b)
    Out: False
    

    Edit

    Since you are using this for unittesting, bare assert (instead of wrapping it to get True/False) might be more natural.

    0 讨论(0)
  • 2020-11-28 10:23

    Just to complete @Luis Albert Centeno’s answer, you may rather use:

    np.allclose(a, b, rtol=0, atol=0, equal_nan=True)
    

    rtol and atol control the tolerance of the equality test. In short, allclose() returns:

    all(abs(a - b) <= atol + rtol * abs(b))
    

    By default they are not set to 0, so the function could return True if your numbers are close but not exactly equal.


    PS: "I want to check if two arrays are identical " >> Actually, you are looking for equality rather than identity. They are not the same in Python and I think it’s better for everyone to understand the difference so as to share the same lexicon. (https://www.blog.pythonlibrary.org/2017/02/28/python-101-equality-vs-identity/)

    You’d test identity via keyword is:

    a is b
    
    0 讨论(0)
提交回复
热议问题