问题
I have a simple grammar for JavaCUP's LR(1) parser that recognises concatenation expressions of identifiers and strings. I also want to add some empty function calls as a possible concatenation argument. However, when I try that, it leads to a shift/reduce conflict.
Grammar:
precedence left PLUS;
e ::= e exp
| exp;
exp ::= concat
| literal;
concatenation ::= exp PLUS exp
| LPAREN exp RPAREN;
literal ::= IDENTIFIER
| STRING
| IDENTIFIER LPAREN RPAREN; // THIS PRODUCES THE ERROR
Input:
x + x + (x) // match
"foo" + x // match
(("goo") + (((y)))) // match
function_name() + x + "foo" + (other_func()) // what I also want
Conflict:
Warning : *** Shift/Reduce conflict found in state #12
between literal ::= IDENTIFIER (*)
and literal ::= IDENTIFIER (*) LPAREN RPAREN
under symbol LPAREN
I have tried many different things like hiding identifier like IDENTIFIER second
at literal and second ::= | LPAREN RPAREN;
but I can't make it work.
回答1:
The context in which this seems to come up is in expressions like
x + x()
where the parser, after seeing x + x
, can't tell whether it's supposed to reduce x + x
back to exp
or shift the (
. In other words, it can't tell whether to interpret the expression as
x + [x()]
or as
[x + x]()
I think you can address this by adding in a precedence rule that gives the open parenthesis in this particular context higher precedence than addition. That way, when the parser sees the shift and reduce action in this state, it knows to shift on an open parenthesis rather than reduce.
回答2:
Bison handles the following grammar with no shift/reduce conflicts:
%token IDENTIFIER STRING
%left IDENTIFIER
%left '('
%left '+'
%%
e : e exp
| exp
exp : concat
| literal
concat : exp '+' exp
| '(' exp ')'
literal: IDENTIFIER
| IDENTIFIER '(' ')'
| STRING
It is necessary to provide a precedence declaration for IDENTIFIER
in order to give a precedence to the literal: IDENTIFIER
production.
I found the grammar a bit odd, since it doesn't seem to allow concatenations to be parenthesized. But I'm sure there are reasons for that.
The above will work fine as long as function calls have no arguments, but it will not allow functions to be called with an argument, since that would be ambiguous. (That might be considered a good reason to not allow an invisible concatenation operator.) For what it's worth, awk
, which has both functions and concatenation without an operator, solves this ambiguity lexically: an identifier followed immediately by (, without intervening whitespace, is tokenized as a FUNC_NAME
, while an identifier followed by whitespace or by any symbol other than ( is tokenized as NAME
.
Another possible solution would be to require functions to be declared before use, and then use a symbol table and lexical feedback (i.e., passing information back from the parser to the lexer; in this case, the fact that a given identifier is a function).
来源:https://stackoverflow.com/questions/36021518/concatenation-shift-reduce-conflict