Why do I get the first capture group only?

问题

(https://stackoverflow.com/a/2304626/6607497 and https://stackoverflow.com/a/37004214/6607497 did not help me)

Analyzing a problem with /proc/stat in Linux I started to write a small utility, but I can't get the capture groups the way I wanted. Here is the code:

#!/usr/bin/perl
use strict;
use warnings;

if (open(my $fh, '<', my $file = '/proc/stat')) {
    while (<$fh>) {
        if (my ($cpu, @vals) = /^cpu(\d*)(?:\s+(\d+))+$/) {
            print "$cpu $#vals\n";
        }
    }
    close($fh);
} else {
    die "$file: $!\n";
}

For example with these input lines I get the output:

> cat /proc/stat
cpu  2709779 13999 551920 11622773 135610 0 194680 0 0 0
cpu0 677679 3082 124900 11507188 134042 0 164081 0 0 0
cpu1 775182 3866 147044 38910 135 0 15026 0 0 0
cpu2 704411 3024 143057 37674 1272 0 8403 0 0 0
cpu3 552506 4025 136918 38999 160 0 7169 0 0 0
intr 176332106  ...

So the match actually works, but I don't get the capture groups into @vals (perls 5.18.2 and 5.26.1).

回答1:

Only the last of the repeated matches from a single pattern is captured.

Instead, can just split the line and then check on -- and adjust -- the first field

while (<$fh>) {
    my ($cpu, @vals) = split;
    next if not $cpu =~ s/^cpu//;
    print "$cpu $#vals\n";
}

If the first element of the split's return doesn't start with cpu the regex substition fails and so the line is skipped. Otherwise, you get the number following cpu (or an empty string), as in OP.

Or, can use the particular structure of the line you process with

while (<$fh>) {
    if (my ($cpu, @vals) = map { split } /^cpu([0-9]*) \s+ (.*)/x) { 
        print "$cpu $#vals\n";
    }
}

The regex returns two items and each is split in the map, except that the first one is just passed as is into $cpu (being either a number or an empty string), while the other yields the numbers.

Both these produce the needed output in my tests.

回答2:

Going by the example input, following content inside the while loop should work.

if (/^cpu(\d*)/) {
    my $cpu = $1;
    my (@vals) = /(?:\s+(\d+))+/g;
    print "$cpu $#vals\n";
}

回答3:

In an exercise for Learning Perl, we state a problem that's easy to solve with two simple regexes but hard with one (but then in Mastering Perl I pull out the big guns). We don't tell people this because we want to highlight the natural behavior to try to write everything in a single regex. Some of the contortions in other answers remind me of that, and I wouldn't want to maintain any of them.

First, there's the issue of only processing the interesting lines. Then, once we have that line, grab all the numbers. Translating that problem statement into code is very simple and straightforward. No acrobatics here because assertions and anchors do most of the work:

use v5.10;

while( <DATA> ) {
    next unless /\A cpu(\d*) \s /ax;
    my $cpu = $1;
    my @values = / \b (\d+) \b /agx;
    say "$cpu " . @values;
    }

__END__
cpu  2709779 13999 551920 11622773 135610 0 194680 0 0 0
cpu0 677679 3082 124900 11507188 134042 0 164081 0 0 0
cpu1 775182 3866 147044 38910 135 0 15026 0 0 0
cpu2 704411 3024 143057 37674 1272 0 8403 0 0 0
cpu3 552506 4025 136918 38999 160 0 7169 0 0 0
intr 176332106  ...

Note that the OP still has to decide how to handle the cpu case with no trailing digits. Don't know what you want to do with the empty string.

回答4:

Perl's regex engine will only remember the last capture group from a repeated expression. If you want to capture each number in a separate capture group, then one option would be to use an explicit regex pattern:

if (open(my $fh, '<', my $file = '/proc/stat')) {
    while (<$fh>) {
        if (my ($cpu, @vals) = /^cpu(\d*)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)\s+(\d+)$/) {
            print "$cpu $#vals\n";
        }
    }
    close($fh);
} else {
    die "$file: $!\n";
}

回答5:

Replacing

    while (<$fh>) {
        if (my ($cpu, @vals) = /^cpu(\d*)(?:\s+(\d+))+$/) {

with

    while (<$fh>) {
        my @vals;
        if (my ($cpu) = /^cpu(\d*)(?:\s+(\d+)(?{ push(@vals, $^N) }))+$/) {

does what I wanted (requires perl 5.8 or newer).

回答6:

he's my example. I thought I'd add it because I like simple code. It also allows "cpu7" with no trailing digits.

#!/usr/bin/perl
use strict;
use warnings;

my $file = "/proc/stat";
open(my $fh, "<", $file) or die "$file: $!\n";
while (<$fh>) 
{
  if ( /^cpu(\d+)(\s+)?(.*)$/ ) 
  {
    my $cpu = $1; 
    my $vals = scalar split( /\s+/, $3 ) ;
    print "$cpu $vals\n";
  }
}
close($fh);

回答7:

Just adding to Tim's answer:

You can capture multiple values with one group (using the g-modifier), but then you have to split the statement.

    if (my ($cpu) = /^cpu(\d*)(?:\s+(\d+))+$/) {
        my @vals= /(?:\s+(\d+))/g;
        print "$cpu $#vals\n";
    }

来源：https://stackoverflow.com/questions/62690858/why-do-i-get-the-first-capture-group-only

标签

regex

perl

regex-group