How can it be shown that no LL(1) grammar can be ambiguous?
I know what is ambiguous grammar but could not prove the above theorem/lemma.
Here's my first draft at a proof. It might need some fine tuning, but I think it covers all the cases. I think many solutions are possible. This is a direct proof.
(Side note: it is a pity SO doesn't support math, such as in LaTeX.)
Proof
Let T and N be the sets of terminal and non-terminal symbols.
Let the following hold
MaybeEmpty(s) = true <=> s ->* empty
First(s) = X containing all x for which there exists Y such that s ->* xY
Follow(A) = X containing all x for which there exists Y,Z such that S ->* YAxZ
Note that a grammar is LL(1) if the following holds for every pair of productions A -> B and A -> C:
1. (not MaybeEmpty(B)) or (not MaybeEmpty(C))
2. (First(B) intersect First(C)) = empty
3. MaybeEmpty(C) => (First(B) intersect Follow(A)) = empty
Consider a language with is LL(1), with A -> B
and A -> C
.
That is to say there is some string of terminals TZ which admits multiple derivations by distinct parse trees.
Suppose that the left derivation reaches S ->* TAY ->* TZ
. The next step may be either TAY -> TBY
, or TAY -> TCY
.
Thus the language is ambiguous if both BY ->* Z
and CY ->* Z
.
(Note that since A is an arbitrary non-terminal, if no such case exists, the language is non-ambiguous.)
Case 1: Z = empty
By rule 1 of LL(1) grammars, at most one of B and C can derive empty (non-ambiguous case).
Case 2: Z non-empty, and neither B nor C derive empty
By rule 2 of LL(1) grammars, at most one of B and C can permit further derivation because the leading terminal of Z cannot be in both First(B)
and First(C)
(non-ambiguous case).
Case 3: Z non-empty, and either MaybeEmpty(B)
or MaybeEmpty(C)
Note the by rule 1 of LL(1) grammars, B and C cannot both derive empty. Suppose therefore that MaybeEmpty(C)
is true.
This gives two sub-cases.
Case 3a: CY -> Y
; and Case 3b: CY ->* DY
, where D is not empty.
In 3a we must choose between BY ->* Z
and CY -> Y ->* Z
, but notice that First(Y) subset-of Follow(A)
. Since Follow(A)
does not intersect First(B)
, only one derivation can proceed (non-ambiguous).
In 3b we must choose between BY ->* Z
and CY ->* DY ->* Z
, but notice that First(D) subset-of First(C)
. Since First(C)
does not intersect First(B)
, only one derivation can proceed (non-ambiguous).
Thus in every case the derivation can only be expanded by one of the available productions. Therefore the grammar is not ambiguous.
Prove that no ambiguous grammar can be an LL(1) grammar. For hints, see http://www.cse.ohio-state.edu/~rountev/756/pdf/SyntaxAnalysis.pdf, slides 18-20. Also see http://seclab.cs.sunysb.edu/sekar/cse304/Parse.pdf, pg. 11 and preceding.
I think it's nearly a direct result of the definition of LL(1). Try proof by contradiction; assume that you have an LL(1) grammar that is ambiguous and look for something you can show to be true and not true. As a starting point "what do you always know as you process input?"
As this seems like a homework problem and I actually haven't finished the problem any more than I sketched out above, I'll stop there.