Pandas boolean comparisson on dataframe

前端未结

关注

 3  1367

盖世英雄少女心

I am getting the error when I make a comparison on a single element in a dataframe, but I don\'t understand why.

I have a dataframe df with timeseries data for a nu

相关标签:

3条回答

迷失自我

2020-12-21 19:54
The problem lies in the if statement.

When you code
```
if this:
    print(that)
```
this will be evaluated as bool(this). And that better come back as True or False.

However, you did:
```
if  pd.isnull(df[[customer_ID]].loc[ts]):
    pass  # idk what you did here because you didn't say... but doesn't matter
```
Also, you stated that pd.isnull(df[[customer_ID]].loc[ts]) evaluated to:
```
8143511    True
Name: 2012-07-01 00:00:00, dtype: bool
```
Does that look like a True or False?
What about bool(pd.isnull(df[[customer_ID]].loc[ts]))?
```
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
```
So the lesson is: A pd.Series cannot be evaluated as True or False

It is, however, a pd.Series of Trues and Falses.

And that is why it doesn't work.
0 讨论(0)
发布评论:

提交评论
- 加载中...
醉话见心

2020-12-21 19:59
The second set of [] was returning a series which I mistook for a single value. The simplest solution is to remove []:
```
if pd.isnull(df[customer_ID].loc[ts]):
       pass
```
0 讨论(0)
发布评论:

提交评论
- 加载中...

爱一瞬间的悲伤

2020-12-21 20:08

Problem is you need compare scalar for return scalar (True, False), but there is one item Series, which is converted to one item boolean Series.

Solutions is converting to scalar using Series.item or values with selecting first value by [0]:

customer_ID = '8143511'
ts = '2012-07-01 00:00:00'

print (df[[customer_ID]].loc[ts].item())
nan

if pd.isnull(df[[customer_ID]].loc[ts]).item():
    print ('super')

print (df[[customer_ID]].loc[ts].values[0])
nan

if pd.isnull(df[[customer_ID]].loc[ts]).values[0]:
    print ('super')

But if use DataFrame.loc, get scalar (if not duplicated index or columns names):

print (df.loc[ts, customer_ID])
nan

customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
if pd.isnull(df.loc[ts, customer_ID]):
    print ('super')

0 讨论(0)