I have a list of \'words\' I want to count below
word_list = [\'one\',\'three\']
And I have a column within pandas dataframe with text below.
Use str.extract
:
df['EXTRACT'] = df.TEXT.str.extract('({})'.format('|'.join(word_list)),
flags=re.IGNORECASE, expand=False).str.lower().fillna('')
df['EXTRACT']
0 one
1 one
2 three
3 three
4 one
5 one
6 one
7
8
Name: EXTRACT, dtype: object
Each word in word_list
is joined by the regex separator |
and then passed to str.extract
for regex pattern matching.
The re.IGNORECASE
switch is turned on for case-insensitive comparisons, and the resultant matches are lowercased to match with your expected output.