问题
I'm redoing a minilanguage I originally built on Perl (see Chessa# on github), but I'm running into a number of issues when I go to apply semantics.
Here is the grammar:
(* integers *)
DEC = /([1-9][0-9]*|0+)/;
int = /(0b[01]+|0o[0-7]+|0x[0-9a-fA-F]+)/ | DEC;
(* floats *)
pointfloat = /([0-9]*\.[0-9]+|[0-9]+\.)/;
expfloat = /([0-9]+\.?|[0-9]*\.)[eE][+-]?[0-9]+/;
float = pointfloat | expfloat;
list = '[' @+:atom {',' @+:atom}* ']';
(* atoms *)
identifier = /[_a-zA-Z][_a-zA-Z0-9]*/;
symbol = int |
float |
identifier |
list;
(* functions *)
arglist = @+:atom {',' @+:atom}*;
function = identifier '(' [arglist] ')';
atom = function | symbol;
prec8 = '(' atom ')' | atom;
prec7 = [('+' | '-' | '~')] prec8;
prec6 = prec7 ['!'];
prec5 = [prec6 '**'] prec6;
prec4 = [prec5 ('*' | '/' | '%' | 'd')] prec5;
prec3 = [prec4 ('+' | '-')] prec4;
(* <| and >| are rotate-left and rotate-right, respectively. They assume the nearest C size. *)
prec2 = [prec3 ('<<' | '>>' | '<|' | '>|')] prec3;
prec1 = [prec2 ('&' | '|' | '^')] prec2;
expr = prec1 $;
The issue I'm running into is that the d
operator is being pulled into the identifier rule when no whitespace exists between the operator and any following alphanumeric strings. While the grammar itself is LL(2), I don't understand where the issue is here.
For instance, 4d6
stops the parser because it's being interpreted as 4
d6
, where d6
is an identifier. What should occur is that it's interpreted as 4
d
6
, with the d
being an operator. In an LL parser, this would indeed be the case.
A possible solution would be to disallow d
from beginning an identifier, but this would disallow functions such as drop
from being named as such.
回答1:
The problem with your example is that Grako has the nameguard
feature enabled by default, and that won't allow parsing just the d
when d6
is ahead.
To disable the feature, instantiate your own Buffer
and pass it to an instance of the generated parser:
from grako.buffering import Buffer
from myparser import MyParser
# get the text
parser = MyParser()
parser.parse(Buffer(text, nameguard=False), 'expre')
The tip version of Grako in the Bitbucket repository adds a --no-nameguard
command-line option to generated parsers.
回答2:
In Perl, you can use Marpa, a general BNF parser, which supports generalized precedence with associativity (and many more) out of the box, e.g.
:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
Number bless => primary
| '(' Expression ')' bless => paren assoc => group
|| Expression '**' Expression bless => exponentiate assoc => right
|| Expression '*' Expression bless => multiply
| Expression '/' Expression bless => divide
|| Expression '+' Expression bless => add
| Expression '-' Expression bless => subtract
Full working example is here. As for programming languages, there is a C parser based on Marpa.
Hope this helps.
来源:https://stackoverflow.com/questions/26016475/rule-precedence-issue-with-grako