问题
On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .*
gives the input string repeated twice!
On Python 3.7.2:
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'
On Python 3.6.4:
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'
On Python 2.7.5 (32 bits):
>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'
What is wrong? How to fix that?
回答1:
This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee.
In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match. So in your example, after .*
greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves "room" for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:
>>> re.findall(".*", 'sample text')
['sample text', '']
So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.
来源:https://stackoverflow.com/questions/54713570/re-sub-replacement-text-doubles-replacement-on-python-3-7