Parsing a sequence of expressions using yacc

坚强是说给别人听的谎言 提交于 2019-12-11 08:55:57

问题


I am trying to parse a sequence of expressions without delimiters in order to be able to parse ML/F# style function invocations:

myfunc expr1 expr2 expr3

However, the sequence of expressions is giving me a list of shift/reduce conflicts.

My guess is that the conflicts are caused by the recursive nature of my grammar, but I don't know how to fix these conflicts.

My (simplified) precedence rules and grammar looks like this:

/* Lowest precedence */
%left PLUS
%left TIMES
%left LPAR
/* Highest precedence */

Expr: 
  | CSTINT                                { CstI $1                   }
  | LPAR Expr RPAR                        { $2                        }
  | Expr TIMES Expr                       { Prim("*", $1, $3)         }
  | Expr PLUS Expr                        { Prim("+", $1, $3)         }
  | NAME ExprList                         { Call(Var $1, $2)          }

ExprList:
  |                                      { []                        }
  | Expr ExprList                        { $1::$2                    }

When I pass this to fsyacc I get a list of shift/reduce and reduce/reduce conflicts. An example shift/reduce conflict is

state 11: shift/reduce error on PLUS

The output from fsyacc for state 11 is:

state 11:
  items:
    Expr -> Expr . 'TIMES' Expr
    Expr -> Expr . 'PLUS' Expr
    ExprList -> Expr . ExprList

  actions:
    action 'EOF' (noprec):   reduce ExprList --> 
    action 'LPAR' (explicit left 10000):   shift 6
    action 'RPAR' (noprec):   reduce ExprList --> 
    action 'COMMA' (noprec):   reduce ExprList --> 
    action 'PLUS' (explicit left 9998):   shift 13
    action 'TIMES' (explicit left 9999):   shift 12
    action 'NAME' (noprec):   shift 14
    action 'CSTINT' (noprec):   shift 5
    action 'error' (noprec):   reduce ExprList --> 
    action '#' (noprec):   reduce ExprList --> 
    action '$$' (noprec):   reduce ExprList --> 

  immediate action: <none>
 gotos:
    goto Expr: 11
    goto ExprList: 16

It has been a while since I took a course in compiler theory, so while I kind of know what shift/reduce and reduce/reduce conflicts are, I am not used to thinking about them. In particular, I don't see how reducing on PLUS could lead to a valid parse. All in all, any insight into one or more of the following questions would be highly appreciated:

  1. Why is my grammar seemingly ambiguous?
  2. Can I fix it using precedence and/or associativity rules or, if not,
  3. Do I need to rewrite the grammar, and if so, roughly, how do I do it?
  4. Is yacc a suitable tool for a construct like this?

回答1:


1. Why is my grammar seemingly ambiguous?

Your grammar is ambiguous. It's not an illusion.

Suppose f is a function.

f  x + 7

Is that f(x) + 7 or f(x+7)?. Your grammar produces both.

IIRC, function application binds very tightly and associates to the left. So the above expression should parse as f(x) + 7.

2. Can I fix it using precedence and/or associativity rules, or if not,

You can disambiguate function application with precedence and associativity rules; you just need to declare a precedence for it with %prec. However, it ends up looking a bit ugly and ...

3. Do I need to rewrite the grammar, and if so, roughly, how do I do it?

... I don't believe it is correct to represent function application as Name ExprList. It's a lot cleaner if you curry the arguments one at a time, at least while building the AST, and it looks prettier if you do it in the grammar rather than with precedence rules, which really weren't designed for invisible operators. See below.

4. Is yacc a suitable tool for a construct like this?

Sure, why not?

Here are two working (as far as I know) yacc grammars. The first uses precedence declarations for everything; the second separates out function application, which I think is cleaner:

// grammar1.y:

%left '+'
%left '*'
%left ATOM ';' '(' ')'

%%

program: /* empty */           { $$ = ""; }
       | program statement ';' { std::cout << $2 << std::endl; }
       | program ';'
       ;

statement: expr
         ;

expr: ATOM
    | '(' expr ')'             { $$ = $2; }
    | expr expr %prec ATOM     { $$ = '(' + $1 + ' ' + $2 + ')'; }
    | expr '+' expr            { $$ = "(+ " + $1 + ' ' + $3 + ')'; }
    | expr '*' expr            { $$ = "(* " + $1 + ' ' + $3 + ')'; }
    ;

// grammar2.y

%token ATOM

%left '+'
%left '*'

%%

program: /* empty */           { $$ = ""; }
       | program statement ';' { std::cout << $2 << std::endl; }
       | program ';'
       ;

statement: expr
         ;

term : ATOM
     | '(' expr ')'             { $$ = $2; }
     ;

apply: term
     | apply term              { $$ = '(' + $1 + ' ' + $2 + ')'; }
     ;

expr : apply
     | expr '+' expr            { $$ = "(+ " + $1 + ' ' + $3 + ')'; }
     | expr '*' expr            { $$ = "(* " + $1 + ' ' + $3 + ')'; }
     ;


来源:https://stackoverflow.com/questions/17933925/parsing-a-sequence-of-expressions-using-yacc

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!