I am getting the error when I make a comparison on a single element in a dataframe, but I don\'t understand why.
I have a dataframe df with timeseries data for a nu
The problem lies in the if
statement.
When you code
if this:
print(that)
this
will be evaluated as bool(this)
. And that better come back as True
or False
.
However, you did:
if pd.isnull(df[[customer_ID]].loc[ts]):
pass # idk what you did here because you didn't say... but doesn't matter
Also, you stated that pd.isnull(df[[customer_ID]].loc[ts])
evaluated to:
8143511 True
Name: 2012-07-01 00:00:00, dtype: bool
Does that look like a True
or False
?
What about bool(pd.isnull(df[[customer_ID]].loc[ts]))
?
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
So the lesson is: A pd.Series
cannot be evaluated as True
or False
It is, however, a pd.Series
of True
s and False
s.
And that is why it doesn't work.
The second set of []
was returning a series which I mistook for a single value. The simplest solution is to remove []
:
if pd.isnull(df[customer_ID].loc[ts]):
pass
Problem is you need compare scalar for return scalar (True
, False
), but there is one item Series
, which is converted to one item boolean Series
.
Solutions is converting to scalar using Series.item or values with selecting first value by [0]
:
customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
print (df[[customer_ID]].loc[ts].item())
nan
if pd.isnull(df[[customer_ID]].loc[ts]).item():
print ('super')
print (df[[customer_ID]].loc[ts].values[0])
nan
if pd.isnull(df[[customer_ID]].loc[ts]).values[0]:
print ('super')
But if use DataFrame.loc, get scalar
(if not duplicated index or columns names):
print (df.loc[ts, customer_ID])
nan
customer_ID = '8143511'
ts = '2012-07-01 00:00:00'
if pd.isnull(df.loc[ts, customer_ID]):
print ('super')