Parsing of optionals with PEG (Grako) falling short?

北战南征 提交于 2019-12-01 03:52:54

问题


My colleague PaulS asked me the following:


I'm writing a parser for an existing language (SystemVerilog - an IEEE standard), and the specification has a rule in it that is similar in structure to this:

cover_point 
    = 
    [[data_type] identifier ':' ] 'coverpoint' identifier ';' 
    ;

data_type 
    = 
    'int' | 'float' | identifier 
    ;

identifier 
    = 
    ?/\w+/? 
    ;

The problem is that when parsing the following legal string:

anIdentifier: coverpoint another_identifier;

anIdentifier matches with data_type (via its identifier option) successfully, which means Grako is looking for another identifier after it and then fails. It doesn't then try to parse without the data_type part.

I can re-write the rule as follows,

cover_point_rewrite  
    = 
    [data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';' 
    ;

but I wonder if:

  1. this is intentional and
  2. if there's a better syntax?

Is this a PEG-in-general issue, or a tool (Grako) one?


回答1:


It says here that in PEGs the choice operator is ordered to avoid CFGs ambiguities by using the first match.

In your first example

[data_type]
succeeds parsing id, so it fails when it finds : instead of another identifier. That may be because [data_type] behaves like (data_type | ε) so it will always parse data_type with the first id.

In

[data_type identifier ':' | identifier ':' ]
the first choice fails when there is no second id, so the parser backtracks and tries with the second choice.

来源:https://stackoverflow.com/questions/24600189/parsing-of-optionals-with-peg-grako-falling-short

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!