Matching balanced parenthesis in Perl regex

前端 未结 6 1643
后悔当初
后悔当初 2021-01-12 08:52

I have an expression which I need to split and store in an array:

aaa=\"bbb{ccc}ffffd\" { aa=\"bb,cc\" { a=\"b\", c=\"d\" } }, aaa=\"bbb{}\" { aa=\"b}b\" }, aa         


        
相关标签:
6条回答
  • 2021-01-12 09:33

    There's an example in perlre, using the recursive regex features introduced in v5.10. Although you are limited to v5.8, other people coming to this question should get the right solution :)

    $re = qr{ 
                (                                # paren group 1 (full function)
                    foo
                    (                            # paren group 2 (parens)
                        \(
                            (                    # paren group 3 (contents of parens)
                                (?:
                                    (?> [^()]+ ) # Non-parens without backtracking
                                    |
                                    (?2)         # Recurse to start of paren group 2
                                )*
                            )
                        \)
                    )
                )
        }x;
    
    0 讨论(0)
  • 2021-01-12 09:35

    Although Recursive Regular Expressions can usually be used to capture "balanced braces" {}, they won't work for you, because you ALSO have the requirement to match "balanced quotes" ".
    This would be a very tricky task for a Perl Regular Expression, and I'm fairly certain it's not possible. (In contrast, it could probably be done with Microsoft's "balancing groups" Regex feature).

    I would suggest creating your own parser. As you process each character, you count each " and {}, and only split on , if they are "balanced".

    0 讨论(0)
  • 2021-01-12 09:36

    A split solution seems simplest. Split on a lookahead of your main variable aaa, with word boundary around. Strip trailing whitespace and comma with an optional character group.

    $string = 'aaa="bbb{ccc}ffffd" { aa="bb,cc" { a="b", c="d" } }, aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"';
    my @array = split /[,\s]*(?=\baaa\b)/, $string;
    
    0 讨论(0)
  • 2021-01-12 09:38

    I agree with Scott Rippey, more or less, about writing your own parser. Here's a simple one:

    my $in = 'aaa="bbb{ccc}ffffd" { aa="bb,cc" { a="b", c="d" } }, ' .
             'aaa="bbb{}" { aa="b}b" }, ' .
             'aaa="bbb,ccc"'
    ;
    
    my @out = ('');
    
    my $nesting = 0;
    while($in !~ m/\G$/cg)
    {
      if($nesting == 0 && $in =~ m/\G,\s*/cg)
      {
        push @out, '';
        next;
      }
      if($in =~ m/\G(\{+)/cg)
        { $nesting += length $1; }
      elsif($in =~ m/\G(\}+)/cg)
      {
        $nesting -= length $1;
        die if $nesting < 0;
      }
      elsif($in =~ m/\G((?:[^{}"]|"[^"]*")+)/cg)
        { }
      else
        { die; }
      $out[-1] .= $1;
    }
    

    (Tested in Perl 5.10; sorry, I don't have Perl 5.8 handy, but so far as I know there aren't any relevant differences.) Needless to say, you'll want to replace the dies with something application-specific. And you'll likely have to tweak the above to handle cases not included in your example. (For example, can quoted strings contain \"? Can ' be used instead of "? This code doesn't handle either of those possibilities.)

    0 讨论(0)
  • 2021-01-12 09:39

    Use the perl module "Regexp::Common". It has a nice balanced parenthesis Regex that works well.

    # ASN.1
    use Regexp::Common;
    $bp = $RE{balanced}{-parens=>'{}'};
    @genes = $l =~ /($bp)/g;
    
    0 讨论(0)
  • 2021-01-12 09:40

    Try something like this:

    use strict;
    use warnings;
    use Data::Dumper;
    
    my $exp=<<END;
    aaa="bbb{ccc}ffffd" { aa="bb,cc" { a="b", c="d" } }     , aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
    END
    
    chomp $exp;
    my @arr = map { $_ =~ s/^\s*//; $_ =~ s/\s* $//; "$_}"} split('}\s*,',$exp);
    print Dumper(\@arr);
    
    0 讨论(0)
提交回复
热议问题