I am getting an exception as I try to slice with a logical expression my Pandas dataframe.
My data have the following form:
df
GDP_norm SP500_
I suggest create boolean masks separately for better readibility and also easier error handling.
Here are missing ()
in m1
and m2
code, problem is in operator precedence:
docs - 6.16. Operator precedence where see &
have higher priority as >=
:
Operator Description
lambda Lambda expression
if – else Conditional expression
or Boolean OR
and Boolean AND
not x Boolean NOT
in, not in, is, is not, Comparisons, including membership tests
<, <=, >, >=, !=, == and identity tests
| Bitwise OR
^ Bitwise XOR
& Bitwise AND
(expressions...), [expressions...], Binding or tuple display, list display,
{key: value...}, {expressions...} dictionary display, set display
m1 = (df['GDP_norm'] >=3.5) & (df['GDP_norm'] <= 4.5)
m2 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m3 = m1 & (df['SP500_Index_deflated_norm'] > 3)
m4 = m2 & (df['SP500_Index_deflated_norm'] < 3.5)
df[m3 | m4]
You are suffering from the effects of chained comparisons. What's happening is the expression df['GDP_norm'] >=3.5 & df['GDP_norm'] <= 4.5
is evaluated as something like:
df['GDP_norm'] >= (3.5 & df['GDP_norm']) <= 4.5
Of course, this fails since float
cannot be compared with bool
, as described in your error message. Instead, use parentheses to isolate each Boolean mask and assign to variables:
m1 = (df['GDP_norm'] >= 3.5) & (df['GDP_norm'] <= 4.5)
m2 = df['SP500_Index_deflated_norm'] > 3
m3 = (df['GDP_norm'] >= 4.0) & (df['GDP_norm'] <= 5.0)
m4 = df['SP500_Index_deflated_norm'] < 3.5
res = df[(m1 & m2) | (m3 & m4)]