When I run this bison code in Ubuntu Linux i get these warnings:
- shift/reduce conflict [-Wconflicts-sr]
- reduce/reduce conflicts [-Wcolficts-sr]
The reduce/reduce conflicts are because you have two non-terminals which exist only to gather together different types:
typos_dedomenwn: T_int
| T_bool
| T_string;
typos_synartisis: T_int
| T_bool
| T_string;
Where these non-terminals are used, it is impossible for the parser to know which one applies; it cannot tell until further along in the declaration. However, it doesn't matter. You could just define a single typos
non-terminal, and use it throughout:
typos: T_int
| T_bool
| T_string;
orismos_metavlitwn: typos lista_metavlitwn T_semic;
kefalida_synartisis: typos T_id T_openpar lista_typikwn_parametrwn T_closepar
| typos T_id T_openpar T_closepar;
typikes_parametroi: typos T_ampersand T_id;
The shift/reduce conflict is the classic problem with "C" style if
statements. These statements are difficult to describe in a way which is not ambiguous. Consider:
if (expr1) if (expr2) statement1; else statement2;
We know that the else
must match the second if
, so the above is equivalent to:
if (expr1) { if (expr2) statement1; else statement2; }
But the grammar also matches the other possible parse, equivalent to:
if (expr1) { if (expr2) statement1; } else statement2;
There are three possible solutions to this problem:
Do nothing. Bison does the right thing here, by design: it always prefers "shift" over "reduce". What that means is that if an else
could match an open if
statement, bison will always do that, rather than holding onto the else
to match some outer if
statement. There is a pretty good description of this in the Dragon book, amongst other places.
The problem with this solution is that you still end up with a warning about shift/reduce conflicts, and it is hard to distinguish between "OK" conflicts, and newly-created "not OK" conflicts. Bison provides the %expect
declaration so you can tell it how many conflicts you expect, which will suppress the warning if the right number are found, but that is still pretty fragile.
Use precedence declarations. These are described in the bison manual. and their use in solving the dangling else problem is a running example in that chapter. In your case, it would look something like this:
%precedence T_then /* Fake terminal, needed for %prec */
%precedence T_else
/* ... */
%%
/* ... */
entoli_if: T_if T_openpar geniki_ekfrasi Tw_closepar entoli T_else entoli
| T_if T_openpar geniki_ekfrasi T_closepar entoli %prec T_then
Here, I have eliminated the unnecessary non-terminal else_clause
because it hides the else
token. If you wanted to keep it, for whatever reason, you would need to add a %prec T_else
to the end of the entoli_if
production which uses it.
The %precedence
declaration is only available from bison 3.0 onwards. If you have an earlier version of bison, you can use the %nonassoc
declaration instead, but this may hide some other errors.
Fix the grammar. It is actually possible to make an unambiguous grammar, but it is a bit of work.
The important point is that in:
if (expr) statement1 else statement2
statement1
cannot be an unmatched if
statement. If statement1
is an if
statement, it must include an else
clause; otherwise, the else
in the outer if
would match the inner if
. And that applies recursively to any trailing statements in statement1
, such as
if (e2) statement2;
else if (e3) statement3
else /* must be present */ statement;
We can express this by dividing statements into "matching" statements (where all if
are matched by else
) and "non-matching" statements: (I haven't tried to preserve the greek non-terminal names here; sorry. You'll have to adapt the idea to your grammar).
statement: matching_statement | non_matching_statement ;
matching_statement: call_statement | assignment_statement | ...
| matching_if_statement
non_matching_statement: non_matching_if_statement
/* might be others, see below */
if_condition: "if" '(' expression ')' ;
matching_if_statement:
if_condition matching_statement "else" matching_statement ;
non_matching_if_statement:
if_condition statement
| if_condition matching_statement "else" non_matching_statement
;
In C, there are other compound statements which can end with a statement (while
, for
). Each of these will also have a "matching" and "non-matching" version, depending on whether the final statement is matching or non-matching:
while_condition: "while" '(' expression ')' ;
matching_while_statement: while_condition matching_statement ;
non_matching_while_statement: while_condition non_matching_statement ;
As far as I can see, this does not apply to your language, but you might want to extend it in the future to include such statements.
Bison allows you to use single character tokens as themselves, surrounded by single quotes. So instead of declaring T_openpar
and then writing verbose rules which use it, you can just write '('
; you don't even need to declare it. (In your flex -- or other -- scanner, you would just return '(';
instead of return T_openpar
, which is why you don't need to declare the token.) This usually makes grammars more readable.
Bison also lets you specify a human-readable name for a token. (This feature is not in all yacc
derivatives, but it is pretty common.), which can also make grammars more readable. For example, you can give names to the if
and else
tokens as follows:
%token T_if "if"
%token T_else "else"
and then you could use the quoted strings in your grammar rules. (I did that in my last example for the dangling-else problem.) In the flex scanner, you still need to use the token symbols T_if
and T_else
.
If you have a two-symbol token like &&
, it is usually better if the scanner recognizes it and returns a single token, instead of the parser recognizing two consecutive &
tokens. In the second case, the parser will recognize:
boolean_expr1 & & boolean_expr2
as though it had been written
boolean_expr1 && boolean_expr2
although the first one was most likely an error which should be reported.
Bison is a bottom-up LALR(1) parser generator. It is not necessary to remove left-recursion. Bottom-up parsers prefer left-recursion, and left-recursive grammars are usually more accurate and easier to read. For example, it is better all round to declare:
apli_ekfrasi: aplos_oros
| apli_ekfrasi '+' aplos_oros
| apli_ekfrasi '-' aplos_oros;
than to use LL-style repeated suffixes (loop7
in your grammar). The left-recursive grammar can be parsed without extending the parser stack, and more accurately represents the syntactic structure of the expression, making parser actions easier to write.
There are a number of other places in your grammar which you might want to revisit.
(This advice comes straight from the bison manual: "you should always use left recursion, because it can parse a sequence of any number of elements with bounded stack space.")