How do I add the tag NEG_
to all words that follow not
, no
and never
until the next punctuation mark in a string(used for
You will need to do this in several steps (at least in Python - .NET languages can use a regex engine that has more capabilities):
First, match a part of a string starting with not
, no
or never
. The regex \b(?:not?|never)\b([^.,:;!?]+)
would be a good starting point. You might need to add more punctuation characters to that list if they occur in your texts.
Then, use the match result's group 1 as the target of your second step: Find all words (for example by splitting on whitespace and/or punctuation) and prepend NEG_
to them.
Join the string together again and insert the result in your original string in the place of the first regex's match.