问题
In Python Pandas and Numpy, why is the comparison result different?
from pandas import Series
from numpy import NaN
NaN
is not equal to NaN
>>> NaN == NaN
False
but NaN
inside a list or tuple is
>>> [NaN] == [NaN], (NaN,) == (NaN,)
(True, True)
While Series
with NaN
are not equal again:
>>> Series([NaN]) == Series([NaN])
0 False
dtype: bool
And None
:
>>> None == None, [None] == [None]
(True, True)
While
>>> Series([None]) == Series([None])
0 False
dtype: bool
This answer explains the reasons for NaN == NaN
being False
in general, but does not explain its behaviour in python/pandas collections.
回答1:
As explained here, and here and in python docs to check sequence equality
element identity is compared first, and element comparison is performed only for distinct elements.
Because np.nan
and np.NaN
refer to the same object i.e. (np.nan is np.nan is np.NaN) == True
this equality holds [np.nan] == [np.nan]
, but on the other hand float('nan')
function creates a new object on every call so [float('nan')] == [float('nan')]
is False
.
Pandas/Numpy do not have this problem:
>>> pd.Series([np.NaN]).eq(pd.Series([np.NaN]))[0], (pd.Series([np.NaN]) == pd.Series([np.NaN]))[0]
(False, False)
Although special equals method treats NaN
s in the same location as equals.
>>> pd.Series([np.NaN]).equals(pd.Series([np.NaN]))
True
None
is treated differently. numpy
considers them equal:
>>> pd.Series([None, None]).values == (pd.Series([None, None])).values
array([ True, True])
While pandas
does not
>>> pd.Series([None, None]) == (pd.Series([None, None]))
0 False
1 False
dtype: bool
Also there is an inconsistency between ==
operator and eq
method, which is discussed here:
>>> pd.Series([None, None]).eq(pd.Series([None, None]))
0 True
1 True
dtype: bool
Tested on pandas: 0.23.4 numpy: 1.15.0
来源:https://stackoverflow.com/questions/52436356/pandas-numpy-nan-none-comparison