re.sub(“.*”, “, ”(replacement)“, ”text") doubles replacement on Python 3.7

对着背影说爱祢 提交于 2019-12-05 17:49:22

问题


On Python 3.7 (tested on Windows 64 bits), the replacement of a string using the RegEx .* gives the input string repeated twice!

On Python 3.7.2:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)(replacement)'

On Python 3.6.4:

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

On Python 2.7.5 (32 bits):

>>> import re
>>> re.sub(".*", "(replacement)", "sample text")
'(replacement)'

What is wrong? How to fix that?


回答1:


This is not a bug, but a bug fix in Python 3.7 from the commit fbb490fd2f38bd817d99c20c05121ad0168a38ee.

In regex, a non-zero-width match moves the pointer position to the end of the match, so that the next assertion, zero-width or not, can continue to match from the position following the match. So in your example, after .* greedily matches and consumes the entire string, the fact that the pointer is then moved to the end of the string still actually leaves "room" for a zero-width match at that position, as can be evident from the following code, which behaves the same in Python 2.7, 3.6 and 3.7:

>>> re.findall(".*", 'sample text')
['sample text', '']

So the bug fix, which is about replacement of a zero-width match right after a non-zero-width match, now correctly replaces both matches with the replacement text.



来源:https://stackoverflow.com/questions/54713570/re-sub-replacement-text-doubles-replacement-on-python-3-7

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!