问题
My colleague PaulS asked me the following:
I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:
cover_point
=
[[data_type] identifier ':' ] 'coverpoint' identifier ';'
;
data_type
=
'int' | 'float' | identifier
;
identifier
=
?/\w+/?
;
The problem is that when parsing the following legal string:
anIdentifier: coverpoint another_identifier;
anIdentifier
matches with data_type
(via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.
I can re-write the rule as follows,
cover_point_rewrite
=
[data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';'
;
but I wonder if:
- this is intentional and
- if there's a better syntax?
Is this a PEG-in-general issue, or a tool (Grako) one?
回答1:
It says here that in PEGs the choice operator is ordered to avoid CFGs ambiguities by using the first match.
In your first example
[data_type]succeeds parsing id, so it fails when it finds
:
instead of another identifier.
That may be because [data_type]
behaves like (data_type | ε)
so it will always parse data_type
with the first id.
In
[data_type identifier ':' | identifier ':' ]the first choice fails when there is no second id, so the parser backtracks and tries with the second choice.
来源:https://stackoverflow.com/questions/24600189/parsing-of-optionals-with-peg-grako-falling-short