Slicing with a logical (boolean) expression a Pandas Dataframe

后端 未结 2 740
既然无缘
既然无缘 2021-01-20 15:59

I am getting an exception as I try to slice with a logical expression my Pandas dataframe.

My data have the following form:

df
    GDP_norm    SP500_         


        
相关标签:
2条回答
  • 2021-01-20 16:51

    I suggest create boolean masks separately for better readibility and also easier error handling.

    Here are missing () in m1 and m2 code, problem is in operator precedence:

    docs - 6.16. Operator precedence where see & have higher priority as >=:

    Operator                                Description
    
    lambda                                  Lambda expression
    if – else                               Conditional expression
    or                                      Boolean OR
    and                                     Boolean AND
    not x                                   Boolean NOT
    in, not in, is, is not,                 Comparisons, including membership tests    
    <, <=, >, >=, !=, ==                    and identity tests
    |                                       Bitwise OR
    ^                                       Bitwise XOR
    &                                       Bitwise AND
    
    (expressions...), [expressions...],     Binding or tuple display, list display,       
    {key: value...}, {expressions...}       dictionary display, set display
    

    m1 = (df['GDP_norm'] >=3.5) & (df['GDP_norm'] <= 4.5)
    m2 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
    
    m3 = m1 & (df['SP500_Index_deflated_norm'] > 3)
    m4 = m2 & (df['SP500_Index_deflated_norm'] < 3.5)
    
    df[m3 | m4]
    
    0 讨论(0)
  • 2021-01-20 17:02

    You are suffering from the effects of chained comparisons. What's happening is the expression df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5 is evaluated as something like:

    df['GDP_norm'] >= (3.5 & df['GDP_norm']) <= 4.5
    

    Of course, this fails since float cannot be compared with bool, as described in your error message. Instead, use parentheses to isolate each Boolean mask and assign to variables:

    m1 = (df['GDP_norm'] >= 3.5) & (df['GDP_norm'] <= 4.5)
    m2 = df['SP500_Index_deflated_norm'] > 3
    
    m3 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
    m4 = df['SP500_Index_deflated_norm'] < 3.5
    
    res = df[(m1 & m2) | (m3 & m4)]
    
    0 讨论(0)
提交回复
热议问题