问题
I have a list of words negative
that has 4783 elements. I want to use the following code
tweets3 = tweets2[tweets2['full_text'].str.contains('|'.join(negative))]
But, it gives ane error like this error: multiple repeat at position 4193
.
I do not understand this error. Apparently, if I use a single word in str.contains
such as str.contains("deal")
I am able to get results.
All I need is a new dataframe that carries only those rows which carry any of the words occuring in the dataframe tweets2
column full_text
.
As a matter of choice I would also like to see if I can have a boolean
column for present and absent values as 0 or 1
.
I arrived at using the following code with the help of @wp78de:
tweets2['negative'] = tweets2.loc[tweets2['full_text'].str.contains(r'(?:{})'.format('|'.join(negative)), regex=True, na=False)].copy()
回答1:
For arbitrary literal strings that may have regular expression metacharacters in it you can use the re.escape()
function. Something along this line should be sufficient:
.str.contains(r'(?:{})'.format(re.escape('|'.join(words)), regex=True, na=False)]
来源:https://stackoverflow.com/questions/60576936/find-any-word-of-a-list-in-the-column-of-dataframe