This question is inspired by this other one.
Comparing s/,(\d)/$1/
to s/,(?=\d)//
: the former uses a capture group to replace only the digit but not the comma, the latter uses a lookahead to determine whether the comma is succeeded by a digit. Why is the latter sometimes faster, as discussed in this answer?
The two approaches do different things and have different kinds of overhead costs. When you capture, perl has to make a copy of the captured text. Look-ahead matches without consuming; it has to mark the location where it starts. You can see what's happening by using the re 'debug'
pragma:
use re 'debug';
my $capture = qr/,(\d)/;
Compiling REx ",(\d)" Final program: 1: EXACT (3) 3: OPEN1 (5) 5: DIGIT (6) 6: CLOSE1 (8) 8: END (0) anchored "," at 0 (checking anchored) minlen 2 Freeing REx: ",(\d)"
use re 'debug';
my $lookahead = qr/,(?=\d)/;
Compiling REx ",(?=\d)" Final program: 1: EXACT (3) 3: IFMATCH[0] (8) 5: DIGIT (6) 6: SUCCEED (0) 7: TAIL (8) 8: END (0) anchored "," at 0 (checking anchored) minlen 1 Freeing REx: ",(?=\d)"
I'd expect look-ahead to be faster than capturing in most cases, but as noted in the other thread regex performance can be data dependent.
As always, when you want to know which of two pieces of code works faster, you have to test it:
#!/usr/bin/perl
use 5.012;
use warnings;
use Benchmark qw<cmpthese>;
say "Extreme ,,,:";
my $Text = ',' x (my $LEN = 512);
cmpthese my $TIME = -10, my $CMP = {
capture => \&capture,
lookahead => \&lookahead,
};
say "\nExtreme ,0,0,0:";
$Text = ',0' x $LEN;
cmpthese $TIME, $CMP;
my $P = 0.01;
say "\nMixed (@{[$P * 100]}% zeros):";
my $zeros = $LEN * $P;
$Text = ',' x ($LEN - $zeros) . ',0' x $zeros;
cmpthese $TIME, $CMP;
sub capture {
local $_ = $Text;
s/,(\d)/$1/;
}
sub lookahead {
local $_ = $Text;
s/,(?=\d)//;
}
The benchmark tests three different cases:
- Only ','
- Only ',0'
- 1% ',0', rest ','
On my machine and with my perl version, it produces these results:
Extreme ,,,:
Rate capture lookahead
capture 23157/s -- -1%
lookahead 23362/s 1% --
Extreme ,0,0,0:
Rate capture lookahead
capture 419476/s -- -65%
lookahead 1200465/s 186% --
Mixed (1% zeros):
Rate capture lookahead
capture 22013/s -- -4%
lookahead 22919/s 4% --
These results substantiates the assumption that the look-ahead version is significantly faster than the capturing, except for the case of almost only commas. And it is indeed not very surprising as PSIAlt already explained in his comment.
regards, Matthias
来源:https://stackoverflow.com/questions/13682758/why-is-lookahead-sometimes-faster-than-capturing