Why is lookahead (sometimes) faster than capturing?

半腔热情 提交于 2019-11-28 03:54:38

问题


This question is inspired by this other one.

Comparing s/,(\d)/$1/ to s/,(?=\d)//: the former uses a capture group to replace only the digit but not the comma, the latter uses a lookahead to determine whether the comma is succeeded by a digit. Why is the latter sometimes faster, as discussed in this answer?


回答1:


The two approaches do different things and have different kinds of overhead costs. When you capture, perl has to make a copy of the captured text. Look-ahead matches without consuming; it has to mark the location where it starts. You can see what's happening by using the re 'debug' pragma:

use re 'debug';
my $capture = qr/,(\d)/;
Compiling REx ",(\d)"
Final program:
   1: EXACT  (3)
   3: OPEN1 (5)
   5:   DIGIT (6)
   6: CLOSE1 (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 2 
Freeing REx: ",(\d)"
use re 'debug';
my $lookahead = qr/,(?=\d)/;
Compiling REx ",(?=\d)"
Final program:
   1: EXACT  (3)
   3: IFMATCH[0] (8)
   5:   DIGIT (6)
   6:   SUCCEED (0)
   7: TAIL (8)
   8: END (0)
anchored "," at 0 (checking anchored) minlen 1 
Freeing REx: ",(?=\d)"

I'd expect look-ahead to be faster than capturing in most cases, but as noted in the other thread regex performance can be data dependent.




回答2:


As always, when you want to know which of two pieces of code works faster, you have to test it:

#!/usr/bin/perl

use 5.012;
use warnings;
use Benchmark qw<cmpthese>;

say "Extreme ,,,:";
my $Text = ',' x (my $LEN = 512);
cmpthese my $TIME = -10, my $CMP = {
    capture => \&capture,
    lookahead => \&lookahead,
};

say "\nExtreme ,0,0,0:";
$Text = ',0' x $LEN;
cmpthese $TIME, $CMP;

my $P = 0.01;
say "\nMixed (@{[$P * 100]}% zeros):";
my $zeros = $LEN * $P;
$Text = ',' x ($LEN - $zeros) . ',0' x $zeros;
cmpthese $TIME, $CMP;

sub capture {
    local $_ = $Text;
    s/,(\d)/$1/;
}

sub lookahead {
    local $_ = $Text;
    s/,(?=\d)//;
}

The benchmark tests three different cases:

  1. Only ','
  2. Only ',0'
  3. 1% ',0', rest ','

On my machine and with my perl version, it produces these results:

Extreme ,,,:
             Rate   capture lookahead
capture   23157/s        --       -1%
lookahead 23362/s        1%        --

Extreme ,0,0,0:
               Rate   capture lookahead
capture    419476/s        --      -65%
lookahead 1200465/s      186%        --

Mixed (1% zeros):
             Rate   capture lookahead
capture   22013/s        --       -4%
lookahead 22919/s        4%        --

These results substantiates the assumption that the look-ahead version is significantly faster than the capturing, except for the case of almost only commas. And it is indeed not very surprising as PSIAlt already explained in his comment.

regards, Matthias



来源:https://stackoverflow.com/questions/13682758/why-is-lookahead-sometimes-faster-than-capturing

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!