I'm redoing a minilanguage I originally built on Perl (see Chessa# on github), but I'm running into a number of issues when I go to apply semantics.
(* integers *)
DEC = /([1-9][0-9]*|0+)/;
int = /(0b[01]+|0o[0-7]+|0x[0-9a-fA-F]+)/ | DEC;
(* floats *)
pointfloat = /([0-9]*\.[0-9]+|[0-9]+\.)/;
expfloat = /([0-9]+\.?|[0-9]*\.)[eE][+-]?[0-9]+/;
float = pointfloat | expfloat;
list = '[' @+:atom {',' @+:atom}* ']';
(* atoms *)
identifier = /[_a-zA-Z][_a-zA-Z0-9]*/;
symbol = int |
float |
identifier |
list;
(* functions *)
arglist = @+:atom {',' @+:atom}*;
function = identifier '(' [arglist] ')';
atom = function | symbol;
prec8 = '(' atom ')' | atom;
prec7 = [('+' | '-' | '~')] prec8;
prec6 = prec7 ['!'];
prec5 = [prec6 '**'] prec6;
prec4 = [prec5 ('*' | '/' | '%' | 'd')] prec5;
prec3 = [prec4 ('+' | '-')] prec4;
(* <| and >| are rotate-left and rotate-right, respectively. They assume the nearest C size. *)
prec2 = [prec3 ('<<' | '>>' | '<|' | '>|')] prec3;
prec1 = [prec2 ('&' | '|' | '^')] prec2;
expr = prec1 $;
The issue I'm running into is that the d
operator is being pulled into the identifier rule when no whitespace exists between the operator and any following alphanumeric strings. While the grammar itself is LL(2), I don't understand where the issue is here.
For instance, 4d6
stops the parser because it's being interpreted as 4
d6
, where d6
is an identifier. What should occur is that it's interpreted as 4
d
6
, with the d
being an operator. In an LL parser, this would indeed be the case.
A possible solution would be to disallow d
from beginning an identifier, but this would disallow functions such as drop
from being named as such.
The problem with your example is that Grako has the nameguard
feature enabled by default, and that won't allow parsing just the d
when d6
is ahead.
To disable the feature, instantiate your own Buffer
and pass it to an instance of the generated parser:
from grako.buffering import Buffer
from myparser import MyParser
# get the text
parser = MyParser()
parser.parse(Buffer(text, nameguard=False), 'expre')
The tip version of Grako in the Bitbucket repository adds a --no-nameguard
command-line option to generated parsers.
In Perl, you can use Marpa, a general BNF parser, which supports generalized precedence with associativity (and many more) out of the box, e.g.
:start ::= Script
Script ::= Expression+ separator => comma
comma ~ [,]
Expression ::=
Number bless => primary
| '(' Expression ')' bless => paren assoc => group
|| Expression '**' Expression bless => exponentiate assoc => right
|| Expression '*' Expression bless => multiply
| Expression '/' Expression bless => divide
|| Expression '+' Expression bless => add
| Expression '-' Expression bless => subtract
Full working example is here. As for programming languages, there is a C parser based on Marpa.
Hope this helps.
来源:https://stackoverflow.com/questions/26016475/rule-precedence-issue-with-grako