python: pandas np.where vs. df.loc with multiple conditions

前端 未结 1 917
孤城傲影
孤城傲影 2021-01-02 20:59

Np.where has been giving me a lot of errors, so I am looking for a solution with df.loc instead.

This is the np.where error I have been getting:

C:\\         


        
相关标签:
1条回答
  • 2021-01-02 21:15

    I think your boolean are not strings, so need remove ':

    df = pd.DataFrame({'Column_A': ['AAA','AAA','ABC','CDE'],
                      'checked': ['0','0','1','0'],
                      'duplicate': [True, True, False, False]})
    
    df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate'] == True), 'Y', '0')
    print (df)
      Column_A checked  duplicate flag
    0      AAA       0       True    0
    1      AAA       0       True    0
    2      ABC       1      False    0
    3      CDE       0      False    0
    

    Or if compare with boolean column, == True can be omited:

    df['flag'] = np.where((df['checked'] == 'Y') &(df['duplicate']), 'Y', '0')
    print (df)
      Column_A checked  duplicate flag
    0      AAA       0       True    0
    1      AAA       0       True    0
    2      ABC       1      False    0
    3      CDE       0      False    0
    

    Also if need check checked need ' because strings:

    df['flag'] = np.where((df['checked'] == '0') &(df['duplicate'] == True), 'Y', '0')
    print (df)
      Column_A checked  duplicate flag
    0      AAA       0       True    Y
    1      AAA       0       True    Y
    2      ABC       1      False    0
    3      CDE       0      False    0
    

    EDIT:

    Solution with loc:

    df['flag'] = '0'
    mask = (df['checked'] == '0') &(df['duplicate'])
    df.loc[mask, 'flag'] = 'Y'
    print (df)
      Column_A checked  duplicate flag
    0      AAA       0       True    Y
    1      AAA       0       True    Y
    2      ABC       1      False    0
    3      CDE       0      False    0
    
    0 讨论(0)
提交回复
热议问题