I would like to find matching strings in a path and use np.select to create a new column with labels dependant on the matches I found.
This is what I have written
The .str
methods operate on object columns. It's possible to have non-string values in such columns, and as a result pandas
returns NaN
for these rows instead of False
. np
then complains because this is not a Boolean.
Luckily, there's an argument to handle this: na=False
a["properties_path"].str.contains('blog', na=False)
Alternatively, you could change your conditions to:
a["properties_path"].str.contains('blog') == True
#or
a["properties_path"].str.contains('blog').fillna(False)
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1, 'foo', 'bar']})
conds = df.a.str.contains('f')
#0 NaN
#1 True
#2 False
#Name: a, dtype: object
np.select([conds], ['XX'])
#ValueError: invalid entry 0 in condlist: should be boolean ndarray
conds = df.a.str.contains('f', na=False)
#0 False
#1 True
#2 False
#Name: a, dtype: bool
np.select([conds], ['XX'])
#array(['0', 'XX', '0'], dtype='<U11')
Your data seem to have nan
, so conditions
have nan
, which breaks np.select
. To fix this, you can do:
s = a["properties_path"].fillna('')
and replace a['properties_path']
in each condition with s
.