Matching balanced parenthesis in Perl regex

青春壹個敷衍的年華 提交于 2019-12-01 02:28:37

问题


I have an expression which I need to split and store in an array:

aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"

It should look like this once split and stored in the array:

aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }
aaa="bbb{}" { aa="b}b" }
aaa="bbb,ccc"

I use Perl version 5.8 and could someone resolve this?


回答1:


Use the perl module "Regexp::Common". It has a nice balanced parenthesis Regex that works well.

# ASN.1
use Regexp::Common;
$bp = $RE{balanced}{-parens=>'{}'};
@genes = $l =~ /($bp)/g;



回答2:


There's an example in perlre, using the recursive regex features introduced in v5.10. Although you are limited to v5.8, other people coming to this question should get the right solution :)

$re = qr{ 
            (                                # paren group 1 (full function)
                foo
                (                            # paren group 2 (parens)
                    \(
                        (                    # paren group 3 (contents of parens)
                            (?:
                                (?> [^()]+ ) # Non-parens without backtracking
                                |
                                (?2)         # Recurse to start of paren group 2
                            )*
                        )
                    \)
                )
            )
    }x;



回答3:


Try something like this:

use strict;
use warnings;
use Data::Dumper;

my $exp=<<END;
aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }     , aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"
END

chomp $exp;
my @arr = map { $_ =~ s/^\s*//; $_ =~ s/\s* $//; "$_}"} split('}\s*,',$exp);
print Dumper(\@arr);



回答4:


I agree with Scott Rippey, more or less, about writing your own parser. Here's a simple one:

my $in = 'aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, ' .
         'aaa="bbb{}" { aa="b}b" }, ' .
         'aaa="bbb,ccc"'
;

my @out = ('');

my $nesting = 0;
while($in !~ m/\G$/cg)
{
  if($nesting == 0 && $in =~ m/\G,\s*/cg)
  {
    push @out, '';
    next;
  }
  if($in =~ m/\G(\{+)/cg)
    { $nesting += length $1; }
  elsif($in =~ m/\G(\}+)/cg)
  {
    $nesting -= length $1;
    die if $nesting < 0;
  }
  elsif($in =~ m/\G((?:[^{}"]|"[^"]*")+)/cg)
    { }
  else
    { die; }
  $out[-1] .= $1;
}

(Tested in Perl 5.10; sorry, I don't have Perl 5.8 handy, but so far as I know there aren't any relevant differences.) Needless to say, you'll want to replace the dies with something application-specific. And you'll likely have to tweak the above to handle cases not included in your example. (For example, can quoted strings contain \"? Can ' be used instead of "? This code doesn't handle either of those possibilities.)




回答5:


Although Recursive Regular Expressions can usually be used to capture "balanced braces" {}, they won't work for you, because you ALSO have the requirement to match "balanced quotes" ".
This would be a very tricky task for a Perl Regular Expression, and I'm fairly certain it's not possible. (In contrast, it could probably be done with Microsoft's "balancing groups" Regex feature).

I would suggest creating your own parser. As you process each character, you count each " and {}, and only split on , if they are "balanced".




回答6:


A split solution seems simplest. Split on a lookahead of your main variable aaa, with word boundary around. Strip trailing whitespace and comma with an optional character group.

$string = 'aaa="bbb{ccc}ddd" { aa="bb,cc" { a="b", c="d" } }, aaa="bbb{}" { aa="b}b" }, aaa="bbb,ccc"';
my @array = split /[,\s]*(?=\baaa\b)/, $string;


来源:https://stackoverflow.com/questions/7974093/matching-balanced-parenthesis-in-perl-regex

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!