Concatenation shift-reduce conflict

问题

I have a simple grammar for JavaCUP's LR(1) parser that recognises concatenation expressions of identifiers and strings. I also want to add some empty function calls as a possible concatenation argument. However, when I try that, it leads to a shift/reduce conflict.

Grammar:

precedence left PLUS;

e ::= e exp
      | exp;

exp ::= concat
      | literal;

concatenation ::= exp PLUS exp
                | LPAREN exp RPAREN;


literal ::= IDENTIFIER
          | STRING
          | IDENTIFIER LPAREN RPAREN; // THIS PRODUCES THE ERROR

Input:

x + x + (x)            // match
"foo" + x              // match
(("goo") + (((y))))    // match

function_name() + x + "foo" + (other_func())    // what I also want

Conflict:

Warning : *** Shift/Reduce conflict found in state #12
between literal ::= IDENTIFIER (*) 
and     literal ::= IDENTIFIER (*) LPAREN RPAREN  
under symbol LPAREN

I have tried many different things like hiding identifier like IDENTIFIER second at literal and second ::= | LPAREN RPAREN; but I can't make it work.

回答1:

The context in which this seems to come up is in expressions like

x + x()

where the parser, after seeing x + x, can't tell whether it's supposed to reduce x + x back to exp or shift the (. In other words, it can't tell whether to interpret the expression as

x + [x()]

or as

[x + x]()

I think you can address this by adding in a precedence rule that gives the open parenthesis in this particular context higher precedence than addition. That way, when the parser sees the shift and reduce action in this state, it knows to shift on an open parenthesis rather than reduce.

回答2:

Bison handles the following grammar with no shift/reduce conflicts:

%token IDENTIFIER STRING
%left IDENTIFIER
%left '('
%left '+'
%%

e      : e exp
       | exp

exp    : concat
       | literal

concat : exp '+' exp
       | '(' exp ')'

literal: IDENTIFIER
       | IDENTIFIER '(' ')'
       | STRING

It is necessary to provide a precedence declaration for IDENTIFIER in order to give a precedence to the literal: IDENTIFIER production.

I found the grammar a bit odd, since it doesn't seem to allow concatenations to be parenthesized. But I'm sure there are reasons for that.

The above will work fine as long as function calls have no arguments, but it will not allow functions to be called with an argument, since that would be ambiguous. (That might be considered a good reason to not allow an invisible concatenation operator.) For what it's worth, awk, which has both functions and concatenation without an operator, solves this ambiguity lexically: an identifier followed immediately by (, without intervening whitespace, is tokenized as a FUNC_NAME, while an identifier followed by whitespace or by any symbol other than ( is tokenized as NAME.

Another possible solution would be to require functions to be declared before use, and then use a symbol table and lexical feedback (i.e., passing information back from the parser to the lexer; in this case, the fact that a given identifier is a function).

来源：https://stackoverflow.com/questions/36021518/concatenation-shift-reduce-conflict

标签

parsing

compiler-construction

context-free-grammar

shift-reduce-conflict

cup