Does the 'o' modifier for Perl regular expressions still provide any benefit?

后端 未结 6 1749
醉话见心
醉话见心 2020-11-29 07:48

It used to be considered beneficial to include the \'o\' modifier at the end of Perl regular expressions. The current Perl documentation does not even seem to list it, cert

相关标签:
6条回答
  • 2020-11-29 08:24

    Here are timings for different ways to call matching.

    $ perl -v | grep version
    This is perl 5, version 20, subversion 1 (v5.20.1) built for x86_64-linux-gnu-thread-multi
    
    $ perl const-in-re-once.pl | sort
    0.200   =~ CONST
    0.200   =~ m/$VAR/o
    0.204   =~ m/literal-wo-vars/
    0.252   =~ m,@{[ CONST ]},o
    0.260   =~ $VAR
    0.276   =~ m/$VAR/
    0.336   =~ m,@{[ CONST ]},
    

    My code:

    #! /usr/bin/env perl
    
    use strict;
    use warnings;
    
    use Time::HiRes qw/ tv_interval clock_gettime gettimeofday /;
    use BSD::Resource qw/ getrusage RUSAGE_SELF /;
    
    use constant RE =>
        qr{
            https?://
            (?:[^.]+-d-[^.]+\.)?
            (?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
            (?:(?:pda|m)\.)?
            (?:(?:news|haber)\.)
            (?:.+\.)?
            yandex\.
            .+
        }x;
    
    use constant FINAL_RE => qr,^@{[ RE ]}(/|$),;
    
    my $RE = RE;
    
    use constant ITER_COUNT => 1e5;
    
    use constant URL => 'http://news.trofimenkov.nerpa.yandex.ru/yandsearch?cl4url=www.forbes.ru%2Fnews%2F276745-visa-otklyuchila-rossiiskie-banki-v-krymu&lr=213&lang=ru';
    
    timeit(
        '=~ m/literal-wo-vars/',
        ITER_COUNT,
        sub {
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ m{
                    ^https?://
                    (?:[^.]+-d-[^.]+\.)?
                    (?:(?: (?:dev-)? nind[^.]* | mr02 )\.)?
                    (?:(?:pda|m)\.)?
                    (?:(?:news|haber)\.)
                    (?:.+\.)?
                    yandex\.
                    .+
                    (/|$)
                }x
            }
        }
    );
    
    timeit(
        '=~ m/$VAR/',
        ITER_COUNT,
        sub {
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ m,^$RE(/|$),
            }
        }
    );
    
    timeit(
        '=~ $VAR',
        ITER_COUNT,
        sub {
            my $r = qr,^$RE(/|$),o;
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ $r
            }
        }
    );
    
    timeit(
        '=~ m/$VAR/o',
        ITER_COUNT,
        sub {
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ m,^$RE(/|$),o
            }
        }
    );
    
    timeit(
        '=~ m,@{[ CONST ]},',
        ITER_COUNT,
        sub {
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ m,^@{[ RE ]}(/|$),
            }
        }
    );
    
    timeit(
        '=~ m,@{[ CONST ]},o',
        ITER_COUNT,
        sub {
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ m,^@{[ RE ]}(/|$),o
            }
        }
    );
    
    timeit(
        '=~ CONST',
        ITER_COUNT,
        sub {
            my $r = qr,^$RE(/|$),o;
            for (my $i = 0; $i < ITER_COUNT; ++$i) {
                URL =~ FINAL_RE
            }
        }
    );
    
    sub timeit {
        my ($name, $iters, $code) = @_;
        #my $t0 = [gettimeofday];
        my $t0 = (getrusage RUSAGE_SELF)[0];
        $code->();
        #my $el = tv_interval($t0);
        my $el = (getrusage RUSAGE_SELF)[0] - $t0;
        printf "%.3f\t%-17s\t%.9f\n", $el, $name, $el / $iters
    }
    
    0 讨论(0)
  • 2020-11-29 08:31

    This is an optimization in the case that the regex includes a variable reference. It indicates that the regex does not change even though it has a variable within it. This allows for optimizations that would not be possible otherwise.

    0 讨论(0)
  • 2020-11-29 08:32

    The /o modifier is in the perlop documentation instead of the perlre documentation since it is a quote-like modifier rather than a regex modifier. That has always seemed odd to me, but that's how it is. Since Perl 5.20, it's now listed in perlre simply to note that you probably shouldn't use it.

    Before Perl 5.6, Perl would recompile the regex even if the variable had not changed. You don't need to do that anymore. You could use /o to compile the regex once despite further changes to the variable, but as the other answers noted, qr// is better for that.

    0 讨论(0)
  • 2020-11-29 08:32

    One thing it, mystifyingly, does not do is, allow a ONCE block, at least at 5.8.8.

    perl -le 'for (1..3){ print; m/${\(print( "between 1 and 2 only"), 3)}/o and print "matched" }'

    0 讨论(0)
  • 2020-11-29 08:41

    I'm sure it's still supported, but it's pretty much obsolete. If you want the regex to be compiled only once, you're better off using a regex object, like so:

    my $reg = qr/foo$bar/;
    

    The interpolation of $bar is done when the variable is initialized, so you will always be using the cached, compiled regex from then on within the enclosing scope. But sometimes you want the regex to be recompiled, because you want it to use the variable's new value. Here's the example Friedl used in The Book:

    sub CheckLogfileForToday()
    {
      my $today = (qw<Sun Mon Tue Wed Thu Fri Sat>)[(localtime)[6]];
    
      my $today_regex = qr/^$today:/i; # compiles once per function call
    
      while (<LOGFILE>) {
        if ($_ =~ $today_regex) {
          ...
        }
      }
    }
    

    Within the scope of the function, the value of $today_regex stays the same. But the next time the function is called, the regex will be recompiled with the new value of $today. If he had just used

    if ($_ =~ m/^$today:/io)
    

    ...the regex would never be updated. So, with the object form you have the efficiency of /o without sacrificing flexibility.

    0 讨论(0)
  • 2020-11-29 08:43

    In the Perl 5 version 20.0 documentation http://perldoc.perl.org/perlre.html it states

    Modifiers
    
    Other Modifiers
    
    …
    
    o - pretend to optimize your code, but actually introduce bugs
    

    which may be a humorous way of saying it was supposed to perform some kind of optimisation, but the implementation is broken.

    Thus the option might be best avoided.

    0 讨论(0)
提交回复
热议问题