问题
I'm trying to come up with an example where positive look-around works but non-capture groups won't work, to further understand their usages. The examples I"m coming up with all work with non-capture groups as well, so I feel like I"m not fully grasping the usage of positive look around.
Here is a string, (taken from a SO example) that uses positive look ahead in the answer. The user wanted to grab the second column value, only if the value of the first column started with ABC, and the last column had the value 'active'.
string ='''ABC1 1.1.1.1 20151118 active
ABC2 2.2.2.2 20151118 inactive
xxx x.x.x.x xxxxxxxx active'''
The solution given used 'positive look ahead' but I noticed that I could use non-caputure groups to arrive at the same answer. So, I'm having trouble coming up with an example where positive look-around works, non-capturing group doesn't work.
pattern =re.compile('ABC\w\s+(\S+)\s+(?=\S+\s+active)') #solution
pattern =re.compile('ABC\w\s+(\S+)\s+(?:\S+\s+active)') #solution w/out lookaround
If anyone would be kind enough to provide an example, I would be grateful.
Thanks.
回答1:
The fundamental difference is the fact, that non-capturing groups still consume the part of the string they match, thus moving the cursor forward.
One example where this makes a fundamental difference is when you try to match certain strings, that are surrounded by certain boundaries and these boundaries can overlap. Sample task:
Match all a
s from a given string, that are surrounded by b
s - the given string is bababaca
. There should be two matches, at positions 2 and 4.
Using lookarounds this is rather easy, you can use b(a)(?=b)
or (?<=b)a(?=b)
and match them. But (?:b)a(?:b)
won't work - the first match will also consume the b at position 3, that is needed as boundary for the second match. (note: the non-capturing group isn't actually needed here)
Another rather prominent sample are password validations - check that the password contains uppercase, lowercase letters, numbers, whatever - you can use a bunch of alternations to match these - but lookaheads come in way easier:
(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!?.])
vs
(?:.*[a-z].*[A-Z].*[0-9].*[!?.])|(?:.*[A-Z][a-z].*[0-9].*[!?.])|(?:.*[0-9].*[a-z].*[A-Z].*[!?.])|(?:.*[!?.].*[a-z].*[A-Z].*[0-9])|(?:.*[A-Z][a-z].*[!?.].*[0-9])|...
来源:https://stackoverflow.com/questions/45944858/functional-difference-between-lookarounds-and-non-capture-group