Backslash escape sequences and word boundaries in Python regex

后端 未结 1 578
情深已故
情深已故 2021-01-16 00:24

Currently using re.sub(re.escape(\"andrew)\"), \"SUB\", stringVar)

Intended behavior:

stringVar = \" andrew) \"
re.sub(re.escape(\"andr         


        
相关标签:
1条回答
  • 2021-01-16 00:54

    From python re module docs

    \b

    Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of alphanumeric or underscore characters, so the end of a word is indicated by whitespace or a non-alphanumeric, non-underscore character. Note that formally, \b is defined as the boundary between a \w and a \W character (or vice versa), or between \w and the beginning/end of the string, so the precise set of characters deemed to be alphanumeric depends on the values of the UNICODE and LOCALE flags. For example, r'\bfoo\b' matches 'foo', 'foo.', '(foo)', 'bar foo baz' but not 'foobar' or 'foo3'.

    In you case the word boundary is recognized as between andrew and ) which is the first non-alphanumeric non-underscore character. The example below illustrates what happens if you include or exclude ')' from the escape.

    >>> stringVar = " andrew) "
    >>> re.sub(r'\b%s\b' % re.escape("andrew)"), "SUB", stringVar)
    ' andrew) '
    >>> re.sub(r'\b%s\b' % re.escape("andrew"), "SUB", stringVar)
    ' SUB) '
    >>> stringVar = "zzzandrew)zzz"
    >>> re.sub(r'\b%s\b' % re.escape("andrew"), "SUB", stringVar)
    'zzzandrew)zzz'
    

    If you have to use the ')' as part of the escape you can use a positive lookahead assertion like below which matches if there is a whitespace (\s) or a non-alphanumeric character (\W) after 'andrew)'

    >>> stringVar = " andrew) "
    >>> re.sub(r'\b%s(?=\s)' % re.escape("andrew)"), "SUB", stringVar)
    ' SUB '
    >>> stringVar = "zzzandrew)zzz"
    >>> re.sub(r'\b%s(?=\s)' % re.escape("andrew)"), "SUB", stringVar)
    'zzzandrew)zzz'
    >>> stringVar = " andrew) "
    >>> re.sub(r'\b%s(?=\W)' % re.escape("andrew)"), "SUB", stringVar)
    ' SUB '
    >>> stringVar = "zzzandrew)zzz"
    >>> re.sub(r'\b%s(?=\W)' % re.escape("andrew)"), "SUB", stringVar)
    'zzzandrew)zzz'
    
    0 讨论(0)
提交回复
热议问题