How to make word boundary \b not match on dashes

后端 未结 3 466
滥情空心
滥情空心 2021-01-05 04:05

I simplified my code to the specific problem I am having.

import re
pattern = re.compile(r\'\\bword\\b\')
result = pattern.sub(lambda x: \"match\", \"-word-          


        
相关标签:
3条回答
  • 2021-01-05 04:35

    Instead of word boundaries, you could also match the character before and after the word with a (\s|^) and (\s|$) pattern.

    Breakdown: \s matches every whitespace character, which seems to be what you are trying to achieve, as you are excluding the dashes. The ^ and $ ensure that if the word is either the first or last in the string(ie. no character before or after) those are matched too.

    Your code would become something like this:

    pattern = re.compile(r'(\s|^)(word)(\s|$)')
    result = pattern.sub(r"\1match\3", "-word- word")
    

    Because this solution uses character classes such as \s, it means that those could be easily replaced or extended. For example if you wanted your words to be delimited by spaces or commas, your pattern would become something like this: r'(,|\s|^)(word)(,|\s|$)'.

    0 讨论(0)
  • 2021-01-05 04:38

    \b basically denotes a word boundary on characters other than [a-zA-Z0-9_] which includes spaces as well. Surround word with negative lookarounds to ensure there is no non-space character after and before it:

    re.compile(r'(?<!\S)word(?!\S)')
    
    0 讨论(0)
  • 2021-01-05 05:00

    What you need is a negative lookbehind.

    pattern = re.compile(r'(?<!-)\bword\b')
    result = pattern.sub(lambda x: "match", "-word- word")
    

    To cite the documentation:

    (?<!...) Matches if the current position in the string is not preceded by a match for ....

    So this will only match, if the word-break \b is not preceded with a minus sign -.

    If you need this for the end of the string you'll have to use a negative lookahead which will look like this: (?!-). The complete regular expression will then result in: (?<!-)\bword(?!-)\b

    0 讨论(0)
提交回复
热议问题