I am doing some text normalization using python and regular expressions. I would like to substitute all \'u\'or \'U\'s with \'you\'. Here is what I have done so far:
Use a special character \b
, which matches empty string at the beginning or at the end of a word:
print re.sub(r'\b[uU]\b', 'you', text)
spaces are not a reliable solution because there are also plenty of other punctuation marks, so an abstract character \b
was invented to indicate a word's beginning or end.