I need to filter rows in a pandas
dataframe so that a specific string column contains at least one of a list of provided substrings. The substrings may have unu
I want to find all elements of a pd.Series
, v
, that contain "at" or "Og". And get 1 if the element contains the pattern or 0 if it doesn't.
re
:
import re
My vector:
v=pd.Series(['cAt','dog','the rat','mouse','froG'])
[Out]:
0 cAt
1 dog
2 the rat
3 mouse
4 froG
I want to find all elements of v that contain "at" or "Og".
This is, I can define my pattern
as:
pattern='at|Og'
Since I want a vector with 1s if the item contains the pattern or 0 if don't.
I create an unitary vector with the same length as v:
v_binary=[1]*len(v)
I obtain a boolenean s
that is True
if one element of v
contains the pattern
or False
if it doesn't contain it.
s=v.str.contains(pattern, flags=re.IGNORECASE, regex=True)
To obtain the binary vector I multiply the v_binary
*s
:
v_binary*s
[Out]
0 1
1 1
2 1
3 0
4 1