I am trying to parse a csv file, and I am trying to access names regex in proto regex in Perl6. It turns out to be Nil. What is the proper way to do it?
gra
I am trying to parse a csv file
Perhaps you are focused on learning Perl 6 parsing and are writing some throwaway code. But if you want industrial strength CSV parsing out of the box, please be aware of the Text::CSV modules[1].
I am trying to access a named regex
If you are learning Perl 6 parsing, please be aware of jnthn's grammar tracer and debugger[2].
in proto regex in Perl6
Your issue is unrelated to it being a proto regex.
Instead the issue is that, while the match object corresponding to your named capture is stored in the overall match object you stored in $m1
, it is not stored precisely where you are looking for it.
To see what's going on, I'll start by simulating what you were trying to do. I'll use a regex that declares just one capture, a "named" (aka "Associative") capture that matches the string ab
.
given 'ab'
{
my $m1 = m/ $<named-capture> = ( ab ) /;
say $m1<named-capture>;
# 「ab」
}
The match object corresponding to the named capture is stored where you'd presumably expect it to appear within $m1
, at $m1<named-capture>
.
But you were getting Nil with $m1<oneCSV>
. What gives?
$m1<oneCSV>
did not workThere are two types of capture: named (aka "Associative") and numbered (aka "Positional"). The parens you wrote in your regex that surrounded <oneCSV>
introduced a numbered capture:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) ) /; # extra parens added
say $m1[0]<named-capture>;
# 「ab」
}
The parens in / ( ... ) /
declare a single top level numbered capture. If it matches, then the corresponding match object is stored in $m1[0]
. (If your regex looked like / ... ( ... ) ... ( ... ) ... ( ... ) ... /
then another match object corresponding to what matches the second pair of parentheses would be stored in $m1[1]
, another in $m1[2]
for the third, and so on.)
The match result for $<named-capture> = ( ab )
is then stored inside $m1[0]
. That's why say $m1[0]<named-capture>
works.
So far so good. But this is only half the story...
$m1[0]<oneCSV>
in your code would not work eitherWhile $m1[0]<named-capture>
in the immediately above code is working, you would still not get a match object in $m1[0]<oneCSV>
in your original code. This is because you also asked for multiple matches of the zeroth capture because you used a *
quantifier:
given 'ab'
{
my $m1 = m/ ( $<named-capture> = ( ab ) )* /; # * is a quantifier
say $m1[0][0]<named-capture>;
# 「ab」
}
Because the *
quantifier asks for multiple matches, Perl 6 writes a list of match objects into $m1[0]
. (In this case there's only one such match so you end up with a list of length 1, i.e. just $m1[0][0]
(and not $m1[0][1]
, $m1[0][2]
, etc.).)
captures nest;
a capture quantified by either *
or +
corresponds to two levels of nesting not just one.
In your original code, you'd have to write say $m1[0][0]<oneCSV>;
to get to the match object you're looking for.
[1] Install relevant modules and write use Text::CSV;
(for a pure Perl 6 implementation) or use Text::CSV:from<Perl5>;
(for a Perl 5 plus XS implementation) at the start of your code. (talk slides (click on top word, eg. "csv", to advance through slides), video, Perl 6 module, Perl 5 XS module.)
[2] Install relevant modules and write use Grammar::Tracer;
or use Grammar::Debugger;
at the start of your code`. (talk slides, video, modules.)
The match for <oneCSV>
lives within the scope of the capture group, which you get via $m1[0]
.
As the group is quantified with *
, the results will again be a list, ie you need another indexing operation to get at a match object, eg $m1[0][0]
for the first one.
The named capture can then be accessed by name, eg $m1[0][0]<oneCSV>
. This will already contain the match result of the appropriate branch of the protoregex.
If you want the whole list of matches instead of a specific one, you can use >>
or map
, eg $m1[0]>>.<oneCSV>
.