Split on comma, but only when not in parenthesis

后端 未结 6 1248
被撕碎了的回忆
被撕碎了的回忆 2021-01-02 23:42

I am trying to do a split on a string with comma delimiter

my $string=\'ab,12,20100401,xyz(A,B)\';
my @array=split(\',\',$string);

If I do

相关标签:
6条回答
  • 2021-01-02 23:56

    Here is one way that should work.

    use Regexp::Common;
    
    my $string = 'ab,12,20100401,xyz(A,B)';
    my @array = ($string =~ /(?:$RE{balanced}{-parens=>'()'}|[^,])+/g);
    

    Regexp::Common can be installed from CPAN.

    There is a bug in this code, coming from the depths of Regexp::Common. Be warned that this will (unfortunately) fail to match the lack of space between ,,.

    0 讨论(0)
  • 2021-01-03 00:03

    Well, old question, but I just happened to wrestle with this all night, and the question was never marked answered, so in case anyone arrives here by Google as I did, here's what I finally got. It's a very short answer using only built-in PERL regex features:

    my $string='ab,12,20100401,xyz(A,B)';
    string =~ 's/((\((?>[^)(]*(?2)?)*\))|[^,()]*)(*SKIP)([,])/$1\n/g';
    my @array=split('\n',$string);
    

    Commas that are not inside parentheses are changed to newlines and then the array is split on them. This will ignore commas inside any level of nested parentheses, as long as they're properly balanced with a matching number of open and close parens.

    This assumes you won't have newline \n characters in the initial value of $string. If you need to, either temporarily replace them with something else before the substitution line and then use a loop to replace back after the split, or just pick a different delimiter to split the array on.

    0 讨论(0)
  • 2021-01-03 00:13

    Limit the number of elements it can be split into:

    split(',', $string, 4)
    
    0 讨论(0)
  • 2021-01-03 00:13

    Here's another way:

    my $string='ab,12,20100401,xyz(A,B)';
    my @array = ($string =~ /(
        [^,]*\([^)]*\)   # comma inside parens is part of the word
        |
        [^,]*)           # split on comma outside parens
        (?:,|$)/gx);
    

    Produces:

    ab
    12
    20100401
    xyz(A,B)
    
    0 讨论(0)
  • 2021-01-03 00:17
    use Text::Balanced qw(extract_bracketed);
    my $string = "ab,12,20100401,xyz(A,B(a,d))";
    my @params = ();
    while ($string) {
        if ($string =~ /^([^(]*?),/) {
            push @params, $1;
            $string =~ s/^\Q$1\E\s*,?\s*//;
        } else {
            my ($ext, $pre);
            ($ext, $string, $pre) = extract_bracketed($string,'()','[^()]+');
            push @params, "$pre$ext";
            $string =~ s/^\s*,\s*//;
        }
    }
    

    This one supports:

    • nested parentheses;
    • empty fields;
    • strings of any length.
    0 讨论(0)
  • 2021-01-03 00:21

    Here is my attempt. It should handle depth well and could even be extended to include other bracketed symbols easily (though harder to be sure that they MATCH). This method will not in general work for quotation marks rather than brackets.

    #!/usr/bin/perl
    
    use strict;
    use warnings;
    
    my $string='ab,12,20100401,xyz(A(2,3),B)';
    
    print "$_\n" for parse($string);
    
    sub parse {
      my ($string) = @_;
      my @fields;
    
      my @comma_separated = split(/,/, $string);
    
      my @to_be_joined;
      my $depth = 0;
      foreach my $field (@comma_separated) {
        my @brackets = $field =~ /(\(|\))/g;
        foreach (@brackets) {
          $depth++ if /\(/;
          $depth-- if /\)/;
        }
    
        if ($depth == 0) {
          push @fields, join(",", @to_be_joined, $field);
          @to_be_joined = ();
        } else {
          push @to_be_joined, $field;
        }
      }
    
      return @fields;
    }
    
    0 讨论(0)
提交回复
热议问题