Check if a string in a Pandas DataFrame column is in a list of strings

后端 未结 4 1602
夕颜
夕颜 2020-11-27 14:59

If I have a frame like this

frame = pd.DataFrame({\'a\' : [\'the cat is blue\', \'the sky is green\', \'the dog is black\']})

and I want to

相关标签:
4条回答
  • 2020-11-27 15:24

    After going through the comments of the accepted answer of extracting the string, this approach can also be tried.

    frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
    
    frame
                  a
    0   the cat is blue
    1  the sky is green
    2  the dog is black
    

    Let us create our list which will have strings that needs to be matched and extracted.

    mylist = ['dog', 'cat', 'fish']
    pattern = '|'.join(mylist)
    

    Now let create a function which will be responsible to find and extract the substring.

    import re
    def pattern_searcher(search_str:str, search_list:str):
    
        search_obj = re.search(search_list, search_str)
        if search_obj :
            return_str = search_str[search_obj.start(): search_obj.end()]
        else:
            return_str = 'NA'
        return return_str
    

    We will use this function with pandas.DataFrame.apply

    frame['matched_str'] = frame['a'].apply(lambda x: pattern_searcher(search_str=x, search_list=pattern))
    

    Result :

                  a             matched_str
       0   the cat is blue         cat
       1  the sky is green         NA
       2  the dog is black         dog
    
    0 讨论(0)
  • 2020-11-27 15:25

    For list should work

    print frame[frame['a'].isin(mylist)]     
    

    http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.isin.html

    0 讨论(0)
  • 2020-11-27 15:27
    frame = pd.DataFrame({'a' : ['the cat is blue', 'the sky is green', 'the dog is black']})
    
    frame
                      a
    0   the cat is blue
    1  the sky is green
    2  the dog is black
    

    The str.contains method accepts a regular expression pattern:

    mylist = ['dog', 'cat', 'fish']
    pattern = '|'.join(mylist)
    
    pattern
    'dog|cat|fish'
    
    frame.a.str.contains(pattern)
    0     True
    1    False
    2     True
    Name: a, dtype: bool
    

    Because regex patterns are supported, you can also embed flags:

    frame = pd.DataFrame({'a' : ['Cat Mr. Nibbles is blue', 'the sky is green', 'the dog is black']})
    
    frame
                         a
    0  Cat Mr. Nibbles is blue
    1         the sky is green
    2         the dog is black
    
    pattern = '|'.join([f'(?i){animal}' for animal in mylist])  # python 3.6+
    
    pattern
    '(?i)dog|(?i)cat|(?i)fish'
    
    frame.a.str.contains(pattern)
    0     True  # Because of the (?i) flag, 'Cat' is also matched to 'cat'
    1    False
    2     True
    
    0 讨论(0)
  • 2020-11-27 15:32

    We can check for three patterns simultaneously using pipe, for example

    for i in range(len(df)):
           if re.findall(r'car|oxide|gen', df.iat[i,1]):
               df.iat[i,2]='Yes'
           else:
               df.iat[i,2]='No'
    
    0 讨论(0)
提交回复
热议问题