I have a string $s1 = \"a_b\";
and I want to match this string but only capture the letters. I tried to use a lookahead:
if($s1 =~ /([a-z])(?=_)
A lookahead looks for next immediate positions and if a true-assertion takes place it backtracks to previous match - right after a
- to continue matching. Your regex would work only if you bring a _
next to the positive lookahead ([a-z])(?=_)_([a-z])
You even don't need (non-)capturing groups in substitution:
if ($s1 =~ /([a-z])_([a-z])/) { print "Captured: $1, $2\n"; }
In reply to @Borodin's comment
I think that moving backwards is the same as a backtrack which is more recognizable by debugging the whole thing (Perl debug mode):
Matching REx "a(?=_)_b" against "a_b"
.
.
.
0 <> <a_b> | 0| 1:EXACT <a>(3)
1 <a> <_b> | 0| 3:IFMATCH[0](9)
1 <a> <_b> | 1| 5:EXACT <_>(7)
2 <a_> <b> | 1| 7:SUCCEED(0)
| 1| subpattern success...
1 <a> <_b> | 0| 9:EXACT <_b>(11)
3 <a_b> <> | 0| 11:END(0)
Match successful!
As above debug output shows at forth line of results (when 3rd step took place) engine consumes characters a_
(while being in a lookahead assertion) and then we see a backtrack happens after successful assertion of positive lookahead, engine skips whole sub-pattern in a reverse manner and starts at the position right after a
.
At line #5, engine has consumed one character only: a
. Regex101 debugger:
How I interpret this backtrack is more clear in this illustration (Thanks to @JDB, I borrowed his style of representation)
a(?=_)_b
*
|\
| \
| : a (match)
| * (?=_)
| |↖
| | ↖
| |↘ ↖
| | ↘ ↖
| | ↘ ↖
| | : _ (match)
| | ^ SUBPATTERN SUCCESS (OP_ASSERT :=> MATCH_MATCH)
| * _b
| |\
| | \
| | : _ (match)
| | : b (match)
| | /
| |/
| /
|/
MATCHED
By this I mean if lookahead assertion succeeds - since extraction of parts of input string is happened - it goes back upward (back to previous match offset - (eptr
(pointer into the subject) is not changed but offset is) and while resetting consumed chars it tries to continue matching from there and I call it a backtrack. Below is a visual representation of steps taken by engine with use of Regexp::Debugger
So I see it a backtrack or a kind of, however if I'm wrong with all these said, then I'd appreciate any reclaims with open arms.