问题
I use python regular expressions (re
module) in my code and noticed different behaviour in theese cases:
re.findall(r'\s*(?:[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # non-capturing group
# results in ['a) xyz', ' b) abc']
and
re.findall(r'\s*(?<=[a-z]\))?[^.)]+', 'a) xyz. b) abc.') # lookbehind
# results in ['a', ' xyz', ' b', ' abc']
What I need to get is just ['xyz', 'abc']
. Why are the examples behave differently and how t get the desired result?
回答1:
The reason a
and b
are included in the second case is because (?<=[a-z]\))
would first find a)
and since lookaround's don't consume any character you are back at the start of string.Now [^.)]+
matches a
Now you are at )
.Since you have made (?<=[a-z]\))
optional [^.)]+
matches xyz
This same thing is repeated with b) abc
remove ?
from the second case and you would get the expected result i.e ['xyz', 'abc']
回答2:
The regex you are looking for is:
re.findall(r'(?<=[a-z]\) )[^) .]+', 'a) xyz. b) abc.')
I believe the currently accepted answer by Anirudha explains the differences between your use of positive lookbehind and non-capturing well, however, the suggestion of removing the ?
from after the positive lookbehind actually results in [' xyz', ' abc']
(note the included spaces).
This is due to the positive lookbehind not matching the space
character as well as not including space
in the main matching character class itself.
来源:https://stackoverflow.com/questions/14692395/positive-lookbehind-vs-non-capturing-group-different-behaviuor