I\'m working with boolean index in Pandas. The question is why the statement:
a[(a[\'some_column\']==some_number) & (a[\'some_other_column\']==some_other
Logical operators for boolean indexing in Pandas
It's important to realize that you cannot use any of the Python logical operators (and
, or
or not
) on pandas.Series
or pandas.DataFrame
s (similarly you cannot use them on numpy.array
s with more than one element). The reason why you cannot use those is because they implicitly call bool
on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous:
>>> import numpy as np
>>> import pandas as pd
>>> arr = np.array([1,2,3])
>>> s = pd.Series([1,2,3])
>>> df = pd.DataFrame([1,2,3])
>>> bool(arr)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
>>> bool(s)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
>>> bool(df)
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
I did cover this more extensively in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A.
However NumPy provides element-wise operating equivalents to these operators as functions that can be used on numpy.array
, pandas.Series
, pandas.DataFrame
, or any other (conforming) numpy.array
subclass:
and
has np.logical_andor
has np.logical_ornot
has np.logical_notSo, essentially, one should use (assuming df1
and df2
are pandas DataFrames):
np.logical_and(df1, df2)
np.logical_or(df1, df2)
np.logical_not(df1)
np.logical_xor(df1, df2)
However in case you have boolean NumPy array, pandas Series, or pandas DataFrames you could also use the element-wise bitwise functions (for booleans they are - or at least should be - indistinguishable from the logical functions):
&
operator|
operatornp.bitwise_not
) or the ~
operator^
operatorTypically the operators are used. However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators:
(df1 < 10) | (df2 > 10) # instead of the wrong df1 < 10 | df2 > 10
This may be irritating because the Python logical operators have a lower precendence than the comparison operators so you normally write a < 10 and b > 10
(where a
and b
are for example simple integers) and don't need the parenthesis.
It is really important to stress that bit and logical operations are only equivalent for boolean NumPy arrays (and boolean Series & DataFrames). If these don't contain booleans then the operations will give different results. I'll include examples using NumPy arrays but the results will be similar for the pandas data structures:
>>> import numpy as np
>>> a1 = np.array([0, 0, 1, 1])
>>> a2 = np.array([0, 1, 0, 1])
>>> np.logical_and(a1, a2)
array([False, False, False, True])
>>> np.bitwise_and(a1, a2)
array([0, 0, 0, 1], dtype=int32)
And since NumPy (and similarly pandas) does different things for boolean (Boolean or “mask” index arrays) and integer (Index arrays) indices the results of indexing will be also be different:
>>> a3 = np.array([1, 2, 3, 4])
>>> a3[np.logical_and(a1, a2)]
array([4])
>>> a3[np.bitwise_and(a1, a2)]
array([1, 1, 1, 2])
Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
-------------------------------------------------------------------------------------
and | np.logical_and | np.bitwise_and | &
-------------------------------------------------------------------------------------
or | np.logical_or | np.bitwise_or | |
-------------------------------------------------------------------------------------
| np.logical_xor | np.bitwise_xor | ^
-------------------------------------------------------------------------------------
not | np.logical_not | np.invert | ~
Where the logical operator does not work for NumPy arrays, pandas Series, and pandas DataFrames. The others work on these data structures (and plain Python objects) and work element-wise.
However be careful with the bitwise invert on plain Python bool
s because the bool will be interpreted as integers in this context (for example ~False
returns -1
and ~True
returns -2
).