I have a database with a number of fields containing comma separated values. I need to split these fields in Perl, which is straightforward enough except that some of the va
The solution you have chosen is superior, but to those who would say otherwise, regular expressions have a recursion element which will match nested parentheses. The following works fine
use strict;
use warnings;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
push @parts, $1 while $s =~ /
((?:
[^(),]+ |
( \(
(?: [^()]+ | (?2) )*
\) )
)*)
(?: ,\s* | $)
/xg;
print "$_\n" for @parts;
even if the parentheses are nested further. No it's not pretty but it does work!
Did anyone say you have to do it in one step? You could slice of values in a loop. Given your example you could use something like this.
use strict;
use warnings;
use 5.010;
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts;
while(1){
my ($elem, $rest) = $s =~ m/^((?:\w|\s)+)(?:,\s*([^\(]*.*))?$/;
if (not $elem) {
say "second approach";
($elem, $rest) = $s =~ m/^(?:((?:\w|\s)+\s*\([^\)]+\)),\s*(.*))$/;
}
$s = $rest;
push @parts, $elem;
last if not $s;
}
use Data::Dumper;
print Dumper \@parts;
Try this:
my $s = q{recycling, environmental science, interdisciplinary (e.g., consumerism, waste management, chemistry, toxicology, government policy, and ethics), consumer education};
my @parts = split /(?![^(]+\)), /, $s;
Another approach that uses loops and split
. I haven't tested the performance, but shouldn't this be faster than the look-ahead regexp solutions (as the length of $str
increases)?
my @elems = split ",", $str;
my @answer;
my @parens;
while(scalar @elems) {
push @answer,(shift @elems) while($elems[0] !~ /\(/);
push @parens, (shift @elems) while($elems[0] !~ /\)/);
push @answer, join ",", (@parens, shift @elems);
@parens = ();
}