问题
I have written a yacc file according to Tiger Book(appendix A, Tiger manual).
But there are still some shift/reduce conflicts. I do not know how to resolve these conflicts.
% yacc --version
bison (GNU Bison) 3.0.2
You can use this cmd to reproduce the problem:
% yacc -dvt tiger.y
tiger.y: warning: 37 shift/reduce conflicts [-Wconflicts-sr]
% cat tiger.y
:
%{
#include <stdio.h>
//#include "util.h"
//#include "errormsg.h"
int yylex(void); /* function prototype */
void yyerror(char *s)
{
EM_error(EM_tokPos, "%s", s);
}
%}
%union {
int pos;
int ival;
string sval;
}
%token <sval> ID STRING
%token <ival> INT
%token
COMMA COLON SEMICOLON LPAREN RPAREN LBRACK RBRACK
LBRACE RBRACE DOT
PLUS MINUS TIMES DIVIDE EQ NEQ LT LE GT GE
AND OR ASSIGN
ARRAY IF THEN ELSE WHILE FOR TO DO LET IN END OF
BREAK NIL
FUNCTION VAR TYPE
%right ASSIGN
%left OR
%left AND
%nonassoc EQ NEQ LT LE GT GE
%left PLUS MINUS
%left TIMES DIVIDE
%left UNARYMINUS
%precedence THEN
%precedence ELSE
%start program
%%
program: exp { }
;
exp:lvalue { }
|NIL { }
|LPAREN explist RPAREN { }
|LPAREN RPAREN {}
|INT {}
|STRING {}
|MINUS exp %prec UNARYMINUS {}
|ID LPAREN RPAREN {}
|ID LPAREN arglist RPAREN {}
|exp PLUS exp {}
|exp MINUS exp {}
|exp TIMES exp {}
|exp DIVIDE exp {}
|exp EQ exp {}
|exp NEQ exp {}
|exp LT exp {}
|exp LE exp {}
|exp GT exp {}
|exp GE exp {}
|exp AND exp {}
|exp OR exp {}
|ID LBRACE RBRACE {}
|ID LBRACE idlist RBRACE {}
|ID LBRACK exp RBRACK OF exp {}
|lvalue ASSIGN exp {}
|IF exp THEN exp ELSE exp {}
|IF exp THEN exp {}
|WHILE exp DO exp {}
|FOR ID ASSIGN exp TO exp DO exp {}
|BREAK {}
|LET decs IN END {}
|LET decs IN explist END {}
;
lvalue: ID {}
| lvalue DOT ID {}
| lvalue LBRACK exp RBRACK {}
;
explist: exp {}
| explist SEMICOLON exp {}
;
arglist:exp {}
|exp COMMA arglist {}
;
idlist:ID EQ exp {}
|ID EQ exp COMMA idlist {}
;
decs:dec {}
|decs dec {}
;
dec:tydec {}
|vardec {}
|fundec {}
;
tydec:TYPE ID EQ ty {}
;
ty:ID {}
|LBRACK tyfields RBRACK {}
|ARRAY OF ID {}
;
tyfields:/* NULL */
|notnulltyfields {}
;
notnulltyfields:ID COLON ID {}
|ID COLON ID COMMA notnulltyfields {}
;
vardec:VAR ID ASSIGN exp {}
|VAR ID COLON ID ASSIGN exp {}
;
fundec:FUNCTION ID LPAREN tyfields RPAREN EQ exp {}
|FUNCTION ID LPAREN tyfields RPAREN COLON ID EQ exp {}
;
回答1:
The shift-reduce conflicts are easily discovered by looking at the tiger.output
file produced with the -v
flag.
Here's an example (I edited out the repetition):
State 88
11 exp: exp . PLUS exp
12 | exp . MINUS exp
# ...
29 | WHILE exp DO exp .
PLUS shift, and go to state 34
MINUS shift, and go to state 35
# ...
PLUS [reduce using rule 29 (exp)]
MINUS [reduce using rule 29 (exp)]
# ...
$default reduce using rule 29 (exp)
We can see that State 88 occurs when a WHILE
expression can be reduced (that's evident from the position of the .
in the state description:
29 | WHILE exp DO exp .
If the lookahead token at this point is a binary operator, the parser doesn't know whether to shift the operator, making the trailing exp
in the WHILE
expression longer, or to immediately reduce the WHILE
. Obviously (to us, not to bison
), the solution is to shift. bison
doesn't know that because the production exp: WHILE exp DO exp
doesn't have a precedence. The precedence of that production would be the precedence of its last terminal, which is DO
, so the simple solution is to define a precedence for DO
. Unsurprisingly, it should be the same as the precedence of ELSE
, as shown by the fact that IF exp THEN exp ELSE exp .
does not produce a shift/reduce conflict.
A similar problem occurs in states 112 and 129.
The shift/reduce conflict in state 1 is also clear from the output
file:
State 1
9 exp: ID . LPAREN RPAREN
10 | ID . LPAREN arglist RPAREN
23 | ID . LBRACE RBRACE
24 | ID . LBRACE idlist RBRACE
25 | ID . LBRACK exp RBRACK OF exp
34 lvalue: ID .
LPAREN shift, and go to state 15
LBRACK shift, and go to state 16
LBRACE shift, and go to state 17
LBRACK [reduce using rule 34 (lvalue)]
$default reduce using rule 34 (lvalue)
Here, the parser has just found an ID
in a context in which an exp
might be reduced, and it faces two possibilities:
shift. The
exp
isID [exp] OF exp
, so that in the end the result will be:ID '[' exp ']' OF exp --> exp (rule 25)
reduce. The
exp
is the lvalueID[exp]
, using the following productions:ID --> lvalue (rule 34) lvalue '[' exp ']' --> lvalue (rule 36) lvalue --> exp (rule 2)
In order to use the second alternative, the parser must immediately reduce ID
to lvalue
, and therein lies the problem: the parser cannot know which of these two possibilities is correct until it sees the OF
following the matching ], but that's far in the future -- in fact, it could be an arbitrary number of tokens away.
The solution here is to avoid forcing the parser to make a decision at this point. There are a couple of possibilities.
Since an expression can only be
ID [ exp ] OF
(and not anything more complicated), we can factorID
out of the conflict:exp : ID | lvalue_not_id | ... lvalue: ID | lvalue_not_id lvalue_not_ID : lvalue DOT ID | ID LBRACK exp RBRACK | lvalue_not_ID LBRACK exp RBRACK
Comparing the current state machine with the state machine after this change should make it clear how that works (and is a useful exercise in learning about bottom-up parsing).
If you don't want to go to all that work, you can simply add an "apparently redundant" production, as suggested by Appel in his textbook:
lvalue: ID | lvalue DOT ID | lvalue LBRACK exp RBRACK | ID LBRACK exp RBRACK
The added production to
lvalue
is obviously going to create a shift-reduce conflict; indeed, it is exactly the same shift-reduce conflict as in the original grammar. But this time, the conflict is between two different productions forlvalue
, and the default shift action is definitely the one you want to take in the case of a bareID
followed by a [. After the shift, both thelvalue
production and theexp
production are still available, so the parser doesn't have to make a decision until it finds the token after the ].The downside with this solution is that the parser generator will continue to report a shift-reduce conflict, since there clearly is one. Since shift-reduce conflicts are generally considered a sign that the grammar may be ambiguous, leaving a shift-reduce conflict in the code will be a long-term maintenance issue: after every grammar change, it will be necessary to verify that the shift-reduce conflict is benign.
Another solution, which also unfortunately leaves the warning in place, is to use bison's
%glr-parser
directive to generate a GLR parser. The GLR algorithm is capable of delaying reduction decisions by effectively maintaining two (or more) different possible parser stacks at the same time. For unambiguous grammars, this turns out to still be O(n) in the length of the input, but it is slightly slower. (Also, this option is not available in many other yacc derivatives.)Finally, you could get rid of
lvalue
by just adding its productions toexp
. You would then need to generalize thelvalue [ exp ]
to beexp [ exp ]
, which means the grammar would recognize a a superset of the original language: it will now accept certain inputs which are not valid. However, it is easy to check in the semantic actions for the relevant productions to see whether anexp
has the form of anlvalue
or not; if it is not, you can generate a syntax error in the semantic action.
来源:https://stackoverflow.com/questions/26968665/bison-shift-reduce-conflict-tiger-compiler