How can it be shown that no LL(1) grammar can be ambiguous?
I know what is ambiguous grammar but could not prove the above theorem/lemma.
Here's my first draft at a proof. It might need some fine tuning, but I think it covers all the cases. I think many solutions are possible. This is a direct proof.
(Side note: it is a pity SO doesn't support math, such as in LaTeX.)
Proof
Let T and N be the sets of terminal and non-terminal symbols.
Let the following hold
MaybeEmpty(s) = true <=> s ->* empty
First(s) = X containing all x for which there exists Y such that s ->* xY
Follow(A) = X containing all x for which there exists Y,Z such that S ->* YAxZ
Note that a grammar is LL(1) if the following holds for every pair of productions A -> B and A -> C:
1. (not MaybeEmpty(B)) or (not MaybeEmpty(C))
2. (First(B) intersect First(C)) = empty
3. MaybeEmpty(C) => (First(B) intersect Follow(A)) = empty
Consider a language with is LL(1), with A -> B
and A -> C
.
That is to say there is some string of terminals TZ which admits multiple derivations by distinct parse trees.
Suppose that the left derivation reaches S ->* TAY ->* TZ
. The next step may be either TAY -> TBY
, or TAY -> TCY
.
Thus the language is ambiguous if both BY ->* Z
and CY ->* Z
.
(Note that since A is an arbitrary non-terminal, if no such case exists, the language is non-ambiguous.)
Case 1: Z = empty
By rule 1 of LL(1) grammars, at most one of B and C can derive empty (non-ambiguous case).
Case 2: Z non-empty, and neither B nor C derive empty
By rule 2 of LL(1) grammars, at most one of B and C can permit further derivation because the leading terminal of Z cannot be in both First(B)
and First(C)
(non-ambiguous case).
Case 3: Z non-empty, and either MaybeEmpty(B)
or MaybeEmpty(C)
Note the by rule 1 of LL(1) grammars, B and C cannot both derive empty. Suppose therefore that MaybeEmpty(C)
is true.
This gives two sub-cases.
Case 3a: CY -> Y
; and Case 3b: CY ->* DY
, where D is not empty.
In 3a we must choose between BY ->* Z
and CY -> Y ->* Z
, but notice that First(Y) subset-of Follow(A)
. Since Follow(A)
does not intersect First(B)
, only one derivation can proceed (non-ambiguous).
In 3b we must choose between BY ->* Z
and CY ->* DY ->* Z
, but notice that First(D) subset-of First(C)
. Since First(C)
does not intersect First(B)
, only one derivation can proceed (non-ambiguous).
Thus in every case the derivation can only be expanded by one of the available productions. Therefore the grammar is not ambiguous.