I need to write a Regular Expression to replace \'.\'
with \',\'
in some patients\' comments about drugs. They were supposed to use comma after men
You can use the following pattern:
\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)
This matches a dot, then captures one or two words where the first one is none of your mentioned pronouns (you will need to expand that list most likely). This has to be followed by a character that is neither a word character nor a space (e.g. .
!
:
,
) or the end of the string.
You will then have to replace it with ,\1
In python
import re
text = "the drug side-effects are: night mare. nausea. night sweat. bad dream. dizziness. severe headache. I suffered. she suffered. she told I should change it."
text = re.sub(r'\.(\s*(?!(?:i|she)\b)\w+(?:\s+\w+)?\s*)(?=[^\w\s]|$)', r',\1', text, flags=re.I)
print(text)
Outputs
the drug side-effects are: night mare, nausea, night sweat, bad dream, dizziness, severe headache. I suffered. she suffered. she told I should change it.
This is likely not absolutely failsafe and you might have to expand the pattern for some edge cases.