Pandas boolean comparisson on dataframe

前端 未结 3 1367
盖世英雄少女心
盖世英雄少女心 2020-12-21 19:44

I am getting the error when I make a comparison on a single element in a dataframe, but I don\'t understand why.

I have a dataframe df with timeseries data for a nu

相关标签:
3条回答
  • 2020-12-21 19:54

    The problem lies in the if statement.

    When you code

    if this:
        print(that)
    

    this will be evaluated as bool(this). And that better come back as True or False.

    However, you did:

    if  pd.isnull(df[[customer_ID]].loc[ts]):
        pass  # idk what you did here because you didn't say... but doesn't matter
    

    Also, you stated that pd.isnull(df[[customer_ID]].loc[ts]) evaluated to:

    8143511    True
    Name: 2012-07-01 00:00:00, dtype: bool
    

    Does that look like a True or False?
    What about bool(pd.isnull(df[[customer_ID]].loc[ts]))?

    ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
    

    So the lesson is: A pd.Series cannot be evaluated as True or False

    It is, however, a pd.Series of Trues and Falses.

    And that is why it doesn't work.

    0 讨论(0)
  • 2020-12-21 19:59

    The second set of [] was returning a series which I mistook for a single value. The simplest solution is to remove []:

    if pd.isnull(df[customer_ID].loc[ts]):
           pass
    
    0 讨论(0)
  • 2020-12-21 20:08

    Problem is you need compare scalar for return scalar (True, False), but there is one item Series, which is converted to one item boolean Series.

    Solutions is converting to scalar using Series.item or values with selecting first value by [0]:

    customer_ID = '8143511'
    ts = '2012-07-01 00:00:00'
    
    print (df[[customer_ID]].loc[ts].item())
    nan
    
    if pd.isnull(df[[customer_ID]].loc[ts]).item():
        print ('super')
    
    print (df[[customer_ID]].loc[ts].values[0])
    nan
    
    if pd.isnull(df[[customer_ID]].loc[ts]).values[0]:
        print ('super')
    

    But if use DataFrame.loc, get scalar (if not duplicated index or columns names):

    print (df.loc[ts, customer_ID])
    nan
    
    customer_ID = '8143511'
    ts = '2012-07-01 00:00:00'
    if pd.isnull(df.loc[ts, customer_ID]):
        print ('super')
    
    0 讨论(0)
提交回复
热议问题