How to add tags to negated words in strings that follow “not”, “no” and “never”

后端 未结 3 640
醉话见心
醉话见心 2021-01-03 12:01

How do I add the tag NEG_ to all words that follow not, no and never until the next punctuation mark in a string(used for

3条回答
  •  悲哀的现实
    2021-01-03 12:13

    You will need to do this in several steps (at least in Python - .NET languages can use a regex engine that has more capabilities):

    • First, match a part of a string starting with not, no or never. The regex \b(?:not?|never)\b([^.,:;!?]+) would be a good starting point. You might need to add more punctuation characters to that list if they occur in your texts.

    • Then, use the match result's group 1 as the target of your second step: Find all words (for example by splitting on whitespace and/or punctuation) and prepend NEG_ to them.

    • Join the string together again and insert the result in your original string in the place of the first regex's match.

提交回复
热议问题