Pandas boolean DataFrame selection ambiguity

后端 未结 3 527
一整个雨季
一整个雨季 2021-01-15 22:50

EDIT: Fixed values in tables.

Let\'s say I have a pandas dataframe df:

>>>df
                  a         b         c
        0  0.016367  0.         


        
相关标签:
3条回答
  • 2021-01-15 23:23

    Since the logical operators are not overridable in python, numpy and pandas override the bitwise operators.

    This means you need to use the bitwise-or operator:

    df[(df > 0.5) | (df < 0)]
    
    0 讨论(0)
  • 2021-01-15 23:30

    It is not possible for custom types to override the behavior of and and or in Python. That is, it is not possible for Numpy to say that it wants [0, 1, 1] and [1, 1, 0] to be [0, 1, 0]. This is because of how the and operation short-circuits (see the documentation); in essence, the short-circuiting behavior of and and or means that these operations must work as two separate truth values on the two arguments; they cannot combine their two operands in some way that makes use of data in both operands at once (for instance, to compare the elements componentwise, as would be natural for Numpy).

    The solution is to use the bitwise operators & and |. However, you do have to be careful with this, since the precedence is not what you might expect.

    0 讨论(0)
  • 2021-01-15 23:41

    You need to use the bitwise or and put the conditions in parentheses:

    df[(df > 0.5) | (df < 0)]
    

    The reason is because it is ambiguous to compare arrays when maybe some of the values in the array satisfy the condition, that is why it becomes ambiguous.

    If you called the attribute any then it would evaluate to True.

    The parentheses is required due to operator precedence.

    Example:

    In [23]:
    
    df = pd.DataFrame(randn(5,5))
    df
    Out[23]:
              0         1         2         3         4
    0  0.320165  0.123677 -0.202609  1.225668  0.327576
    1 -0.620356  0.126270  1.191855  0.903879  0.214802
    2 -0.974635  1.712151  1.178358  0.224962 -0.921045
    3 -1.337430 -1.225469  1.150564 -1.618739 -1.297221
    4 -0.093164 -0.928846  1.035407  1.766096  1.456888
    In [24]:
    
    df[(df > 0.5) | (df < 0)]
    Out[24]:
              0         1         2         3         4
    0       NaN       NaN -0.202609  1.225668       NaN
    1 -0.620356       NaN  1.191855  0.903879       NaN
    2 -0.974635  1.712151  1.178358       NaN -0.921045
    3 -1.337430 -1.225469  1.150564 -1.618739 -1.297221
    4 -0.093164 -0.928846  1.035407  1.766096  1.456888
    
    0 讨论(0)
提交回复
热议问题