Logical operators for boolean indexing in Pandas

前端 未结 3 933
清歌不尽
清歌不尽 2020-11-21 23:13

I\'m working with boolean index in Pandas. The question is why the statement:

a[(a[\'some_column\']==some_number) & (a[\'some_other_column\']==some_other         


        
3条回答
  •  忘了有多久
    2020-11-21 23:22

    Logical operators for boolean indexing in Pandas

    It's important to realize that you cannot use any of the Python logical operators (and, or or not) on pandas.Series or pandas.DataFrames (similarly you cannot use them on numpy.arrays with more than one element). The reason why you cannot use those is because they implicitly call bool on their operands which throws an Exception because these data structures decided that the boolean of an array is ambiguous:

    >>> import numpy as np
    >>> import pandas as pd
    >>> arr = np.array([1,2,3])
    >>> s = pd.Series([1,2,3])
    >>> df = pd.DataFrame([1,2,3])
    >>> bool(arr)
    ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
    >>> bool(s)
    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    >>> bool(df)
    ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

    I did cover this more extensively in my answer to the "Truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()" Q+A.

    NumPys logical functions

    However NumPy provides element-wise operating equivalents to these operators as functions that can be used on numpy.array, pandas.Series, pandas.DataFrame, or any other (conforming) numpy.array subclass:

    • and has np.logical_and
    • or has np.logical_or
    • not has np.logical_not
    • numpy.logical_xor which has no Python equivalent but is a logical "exclusive or" operation

    So, essentially, one should use (assuming df1 and df2 are pandas DataFrames):

    np.logical_and(df1, df2)
    np.logical_or(df1, df2)
    np.logical_not(df1)
    np.logical_xor(df1, df2)
    

    Bitwise functions and bitwise operators for booleans

    However in case you have boolean NumPy array, pandas Series, or pandas DataFrames you could also use the element-wise bitwise functions (for booleans they are - or at least should be - indistinguishable from the logical functions):

    • bitwise and: np.bitwise_and or the & operator
    • bitwise or: np.bitwise_or or the | operator
    • bitwise not: np.invert (or the alias np.bitwise_not) or the ~ operator
    • bitwise xor: np.bitwise_xor or the ^ operator

    Typically the operators are used. However when combined with comparison operators one has to remember to wrap the comparison in parenthesis because the bitwise operators have a higher precedence than the comparison operators:

    (df1 < 10) | (df2 > 10)  # instead of the wrong df1 < 10 | df2 > 10
    

    This may be irritating because the Python logical operators have a lower precendence than the comparison operators so you normally write a < 10 and b > 10 (where a and b are for example simple integers) and don't need the parenthesis.

    Differences between logical and bitwise operations (on non-booleans)

    It is really important to stress that bit and logical operations are only equivalent for boolean NumPy arrays (and boolean Series & DataFrames). If these don't contain booleans then the operations will give different results. I'll include examples using NumPy arrays but the results will be similar for the pandas data structures:

    >>> import numpy as np
    >>> a1 = np.array([0, 0, 1, 1])
    >>> a2 = np.array([0, 1, 0, 1])
    
    >>> np.logical_and(a1, a2)
    array([False, False, False,  True])
    >>> np.bitwise_and(a1, a2)
    array([0, 0, 0, 1], dtype=int32)
    

    And since NumPy (and similarly pandas) does different things for boolean (Boolean or “mask” index arrays) and integer (Index arrays) indices the results of indexing will be also be different:

    >>> a3 = np.array([1, 2, 3, 4])
    
    >>> a3[np.logical_and(a1, a2)]
    array([4])
    >>> a3[np.bitwise_and(a1, a2)]
    array([1, 1, 1, 2])
    

    Summary table

    Logical operator | NumPy logical function | NumPy bitwise function | Bitwise operator
    -------------------------------------------------------------------------------------
           and       |  np.logical_and        | np.bitwise_and         |        &
    -------------------------------------------------------------------------------------
           or        |  np.logical_or         | np.bitwise_or          |        |
    -------------------------------------------------------------------------------------
                     |  np.logical_xor        | np.bitwise_xor         |        ^
    -------------------------------------------------------------------------------------
           not       |  np.logical_not        | np.invert              |        ~
    

    Where the logical operator does not work for NumPy arrays, pandas Series, and pandas DataFrames. The others work on these data structures (and plain Python objects) and work element-wise. However be careful with the bitwise invert on plain Python bools because the bool will be interpreted as integers in this context (for example ~False returns -1 and ~True returns -2).

提交回复
热议问题