I simplified my code to the specific problem I am having.
import re
pattern = re.compile(r\'\\bword\\b\')
result = pattern.sub(lambda x: \"match\", \"-word-
Instead of word boundaries, you could also match the character before and after the word with a (\s|^)
and (\s|$)
pattern.
Breakdown: \s
matches every whitespace character, which seems to be what you are trying to achieve, as you are excluding the dashes. The ^
and $
ensure that if the word is either the first or last in the string(ie. no character before or after) those are matched too.
Your code would become something like this:
pattern = re.compile(r'(\s|^)(word)(\s|$)')
result = pattern.sub(r"\1match\3", "-word- word")
Because this solution uses character classes such as \s
, it means that those could be easily replaced or extended. For example if you wanted your words to be delimited by spaces or commas, your pattern would become something like this: r'(,|\s|^)(word)(,|\s|$)'
.
\b
basically denotes a word boundary on characters other than [a-zA-Z0-9_]
which includes spaces as well. Surround word
with negative lookarounds to ensure there is no non-space character after and before it:
re.compile(r'(?<!\S)word(?!\S)')
What you need is a negative lookbehind.
pattern = re.compile(r'(?<!-)\bword\b')
result = pattern.sub(lambda x: "match", "-word- word")
To cite the documentation:
(?<!...)
Matches if the current position in the string is not preceded by a match for ....
So this will only match, if the word-break \b
is not preceded with a minus sign -
.
If you need this for the end of the string you'll have to use a negative lookahead which will look like this: (?!-)
. The complete regular expression will then result in: (?<!-)\bword(?!-)\b