Pandas filtering for multiple substrings in series

前端 未结 3 1795
说谎
说谎 2020-11-22 04:08

I need to filter rows in a pandas dataframe so that a specific string column contains at least one of a list of provided substrings. The substrings may have unu

3条回答
  •  遥遥无期
    2020-11-22 04:26

    Using a simpler example & ignore case (upper or lowercase)

    Filtering and getting a binary vector:

    I want to find all elements of a pd.Series, v, that contain "at" or "Og". And get 1 if the element contains the pattern or 0 if it doesn't.

    I'll use the re:
    import re
    

    My vector:

    v=pd.Series(['cAt','dog','the rat','mouse','froG'])
    
    [Out]:
    
    0        cAt
    1        dog
    2    the rat
    3      mouse
    4       froG
    

    I want to find all elements of v that contain "at" or "Og". This is, I can define my pattern as:

    pattern='at|Og'
    

    Since I want a vector with 1s if the item contains the pattern or 0 if don't.

    I create an unitary vector with the same length as v:

    v_binary=[1]*len(v)
    

    I obtain a boolenean s that is Trueif one element of vcontains the patternor Falseif it doesn't contain it.

    s=v.str.contains(pattern, flags=re.IGNORECASE, regex=True)
    

    To obtain the binary vector I multiply the v_binary*s:

    v_binary*s
    
    [Out]
    
    0    1
    1    1
    2    1
    3    0
    4    1
    

提交回复
热议问题