Extract substring from text in a pandas DataFrame as new column

后端 未结 1 1866
挽巷
挽巷 2021-02-10 10:50

I have a list of \'words\' I want to count below

word_list = [\'one\',\'three\']

And I have a column within pandas dataframe with text below.

相关标签:
1条回答
  • 2021-02-10 11:33

    Use str.extract:

    df['EXTRACT'] = df.TEXT.str.extract('({})'.format('|'.join(word_list)), 
                            flags=re.IGNORECASE, expand=False).str.lower().fillna('')
    df['EXTRACT']
    
    0      one
    1      one
    2    three
    3    three
    4      one
    5      one
    6      one
    7         
    8         
    Name: EXTRACT, dtype: object
    

    Each word in word_list is joined by the regex separator | and then passed to str.extract for regex pattern matching.

    The re.IGNORECASE switch is turned on for case-insensitive comparisons, and the resultant matches are lowercased to match with your expected output.

    0 讨论(0)
提交回复
热议问题