Numpy select returning boolean error message

前端 未结 2 2024
孤独总比滥情好
孤独总比滥情好 2020-12-19 18:18

I would like to find matching strings in a path and use np.select to create a new column with labels dependant on the matches I found.

This is what I have written

相关标签:
2条回答
  • 2020-12-19 18:49

    The .str methods operate on object columns. It's possible to have non-string values in such columns, and as a result pandas returns NaN for these rows instead of False. np then complains because this is not a Boolean.

    Luckily, there's an argument to handle this: na=False

    a["properties_path"].str.contains('blog', na=False)
    

    Alternatively, you could change your conditions to:

    a["properties_path"].str.contains('blog') == True
    #or
    a["properties_path"].str.contains('blog').fillna(False)
    

    Sample

    import pandas as pd
    import numpy as np
    
    df = pd.DataFrame({'a': [1, 'foo', 'bar']})
    conds = df.a.str.contains('f')
    #0      NaN
    #1     True
    #2    False
    #Name: a, dtype: object
    
    np.select([conds], ['XX'])
    #ValueError: invalid entry 0 in condlist: should be boolean ndarray
    
    conds = df.a.str.contains('f', na=False)
    #0    False
    #1     True
    #2    False
    #Name: a, dtype: bool
    
    np.select([conds], ['XX'])
    #array(['0', 'XX', '0'], dtype='<U11')
    
    0 讨论(0)
  • 2020-12-19 18:51

    Your data seem to have nan, so conditions have nan, which breaks np.select. To fix this, you can do:

    s = a["properties_path"].fillna('')
    

    and replace a['properties_path'] in each condition with s.

    0 讨论(0)
提交回复
热议问题