How can I tell which of the alternatives matched in a Perl regular expression pattern?

余生长醉 提交于 2019-12-03 17:17:10

You don't need to code up your own state machine to combine regexes. Look into Regexp:Assemble. It has methods that'll track which of your initial patterns matched.

Edit:

use strict;
use warnings;

use 5.012;

use Regexp::Assemble;

my $string = 'A123';

my $re = Regexp::Assemble->new(track => 1);
for my $pattern (qw/ A(\d+) B(\d+) C(\d+) /) {
  $re->add($pattern);
}

say $re->re; ### (?-xism:(?:A(\d+)(?{0})|B(\d+)(?{2})|C(\d+)(?{1})))
say for $re->match($string); ### A(\d+)
say for $re->capture; ### 123

Why not use /^ (?<prefix> A|B|C) (?<digits> \d+) $/x. Note, named capture groups used for clarity, and not essential.

A123 will be in capture group $1 and 123 will be in group $2

So you could say:

if ( /^(A(\d+))|(B(\d+))|(C(\d+))$/ && $1 eq 'A123' && $2 eq '123' ) {
    ...
}

This is redundant, but you get the idea...

EDIT: No, you don't have to enumerate each sub match, you asked how to know whether A123 matched and how to extract 123:

  • You won't enter the if block unless A123 matched
  • and you can extract 123 using the $2 backreference.

So maybe this example would have been more clear:

if ( /^(A(\d+))|(B(\d+))|(C(\d+))$/ ) {
    # do something with $2, which will be '123' assuming $_ matches /^A123/
}

EDIT 2:

To capture matches in an AoA (which is a different question, but this should do it):

#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;

my @matches = map { [$1,$2] if /^(?:(A|B|C)(\d+))$/ } <DATA>;
print Dumper \@matches;

__DATA__
A123
B456
C769

Result:

$VAR1 = [
          [
            'A',
            '123'
          ],
          [
            'B',
            '456'
          ],
          [
            'C',
            '769'
          ]
        ];

Note that I modified your regex, but it looks like that's what you're going for judging by your comment...

With your example data, it is easy to write

'A123' =~ /^([ABC])(\d+)$/;

after which $1 will contain the prefix and $2 the suffix.

I cannot tell whether this is relevant to your real data, but to use an additional module seems like overkill.

Another thing you can do in Perl is to embed Perl code directly in your regex using "(?{...})". So, you can set a variable that tells you which part of the regex matched. WARNING: your regex should not contain any variables (outside of the embedded Perl code), that will be interpolated into the regex or you will get errors. Here is a sample parser that uses this feature:

my $kind;
my $REGEX  = qr/
          [A-Za-z][\w]*                        (?{$kind = 'IDENT';})
        | (?: ==? | != | <=? | >=? )           (?{$kind = 'OP';})
        | -?\d+                                (?{$kind = 'INT';})
        | \x27 ( (?:[^\x27] | \x27{2})* ) \x27 (?{$kind = 'STRING';})
        | \S                                   (?{$kind = 'OTHER';})
        /xs;

my $line = "if (x == 'that') then x = -23 and y = 'say ''hi'' for me';";
my @tokens;
while ($line =~ /( $REGEX )/xsg) {
    my($match, $str) = ($1,$2);
    if ($kind eq 'STRING') {
        $str =~ s/\x27\x27/\x27/g;
        push(@tokens, ['STRING', $str]);
        }
    else {
        push(@tokens, [$kind, $match]);
        }
    }
foreach my $lItems (@tokens) {
    print("$lItems->[0]: $lItems->[1]\n");
    }

which prints out the following:

IDENT: if
OTHER: (
IDENT: x
OP: ==
STRING: that
OTHER: )
IDENT: then
IDENT: x
OP: =
INT: -23
IDENT: and
IDENT: y
OP: =
STRING: say 'hi' for me
OTHER: ;

It's kind of contrived, but you'll notice that the quotes (actually, apostrophes) around strings are stripped off (also, consecutive quotes are collapsed to single quotes), so in general, only the $kind variable will tell you whether the parser saw an identifier or a quoted string.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!