问题
I was trying to anser this question where the OP has the following string:
"path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
and wants to split it to obtain the following list:
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
I tried to solve it by using a simple lookahead assertion in a regex, (?=path:)
. Well, it did not work:
>>> s = "path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism"
>>> r = re.compile('(?=path:)')
>>> r.split(s)
['path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism']
However, in this answer, the answerer got it working by preceding the lookahead assertion with a whitespace:
>>> line = 'path:bte00250 Alanine, aspartate and glutamate metabolism path:bte00330 Arginine and proline metabolism'
>>> re.split(' (?=path:)', line)
['path:bte00250 Alanine, aspartate and glutamate metabolism', 'path:bte00330 Arginine and proline metabolism']
Why did the regex work with the whitespace? Why did it not work without the whitespace?
回答1:
Python's re.split()
has a documented limitation: It can't split on zero-length matches. Therefore the split only worked with the added space.
来源:https://stackoverflow.com/questions/6712855/two-very-close-regexes-with-lookahead-assertions-in-python-why-does-re-split