Impossible lookbehind with a backreference

后端 未结 1 1236
一向
一向 2020-12-31 10:00

From my understanding,

(.)(?

should never match. Actually, php\'s preg_replace even refuses to compile this and so does

相关标签:
1条回答
  • 2020-12-31 11:01

    This does look like a limitation (nice way of saying "bug", as I learned from a support call with Microsoft) in the Python re module.

    I guess it has to do with the fact that Python does not support variable-length lookbehind assertions, but it's not clever enough to figure out that \1 will always be fixed-length. Why it doesn't complain about this when compiling the regex, I can't say.

    Funnily enough:

    >>> print (re.sub(r'.(?<!\0)', r'(\g<0>)', test))
    (x)(A)(A)(A)(A)(A)(y)(B)(B)(B)(B)(z)
    >>>
    >>> re.compile(r'(.*)(?<!\1)') # This should trigger an error but doesn't!
    <_sre.SRE_Pattern object at 0x00000000026A89C0>
    

    So better don't use backreferences in lookbehind assertions in Python. Positive lookbehind isn't much better (it also matches here as if it was a positive lookahead):

    >>> print (re.sub(r'(.)(?<=\1)', r'(\g<0>)', test))
    x(A)(A)(A)(A)Ay(B)(B)(B)Bz
    

    And I can't even guess what's going on here:

    >>> print (re.sub(r'(.+)(?<=\1)', r'(\g<0>)', test))
    x(AA)(A)(A)Ay(BB)(B)Bz
    
    0 讨论(0)
提交回复
热议问题