Why/how is an additional variable needed in matching repeated arbitary character with capture groups?

前端 未结 3 706
故里飘歌
故里飘歌 2020-12-20 13:50

I\'m matching a sequence of a repeating arbitrary character, with a minimum length, using a perl6 regex.

After reading through https://docs.perl6.org/language/regex

3条回答
  •  醉梦人生
    2020-12-20 14:16

    The reason you have to store the capture into something other than $0 is that every capturing () creates a new set of numbered captures.

    So the $0 inside of ($0) can never refer to anything, because you didn't set $0 inside of the ().

    (The named captures $ are also affected by this.)


    The following has 3 separate $0 “variables”, and one $1 “variable”:

    'aabbaabb' ~~ / ^ ( (.)$0 ((.)$0) ) $0 $ /
    
    'aabbaabb' ~~ /
                    ^
    
                    # $0 = 'aabb'
                    (
    
                      # $0 = 'a'
                      (.) $0
    
                      # $1 = 'bb'
                      (
    
                        # $0 = 'b'
                        (.) $0
                      )
                    )
    
                    $0
    
                    $
                  /
    
    「aabbaabb」
     0 => 「aabb」
      0 => 「a」
      1 => 「bb」
       0 => 「b」
    

    Basically the () in the regex DSL act a bit like {} in normal Perl6.

    A fairly direct if simplified translation of the above regex to “regular” Perl6 code follows.
    (Pay attention to the 3 lines with my $/ = [];)
    (Also the / ^ / style comments refer to the regex code for ^ and such above)

    given 'aabbaabb' {
        my $/ = [];      # give assignable storage for $0, $1 etc.
        my $pos = 0;     # position counter
        my $init = $pos; # initial position
    
        # / ^ /
        fail unless $pos == 0;
    
        # / ( /
        $0 = do {
            my $/ = [];
            my $init = $pos;
    
            # / (.) $0 /
            $0 = .substr($pos,1); # / (.) /
            $pos += $0.chars;
            fail unless .substr($pos,$0.chars) eq $0; # / $0 /
            $pos += $0.chars;
    
            # / ( /
            $1 = do {
                my $/ = [];
                my $init = $pos;
    
                # / (.) $0 /
                $0 = .substr($pos,1); # / (.) /
                $pos += $0.chars;
                fail unless .substr($pos,$0.chars) eq $0; # / $0 /
                $pos += $0.chars;
    
            # / ) /
                # the returned value (becomes $1 in outer scope)
               .substr($init, $pos - $init)
            }
    
        # / ) /
            # the returned value (becomes $0 in outer scope)
            .substr($init, $pos - $init)
        }
    
        # / $0 /
        fail unless .substr($pos,$0.chars) eq $0;
        $pos += $0.chars;
    
        # / $ /
        fail unless $pos = .chars;
    
        # the returned value
        .substr($init, $pos - $init)
    }
    

    TLDR;

    Just remove the () surrounding ($c) / ($0).
    (Assuming you didn't need the capture for something else.)

    /((.) $0**2..*)/
    
    perl6 -e '$_="bbaaaaawer"; /((.) $0**2..*)/ && put $0';
    

提交回复
热议问题