LR(1) parsing table with epsilon productions

问题

I'm having trouble building the collection of sets of items for LR(1) parsers with a grammar containing epsilon productions. For example, given the following grammar (where eps stands for epsilon)

S -> a S U
U -> b
  |  eps

State0 would be

S' -> .S, $
S -> .a S U, $

Moving with 'a' from State0 would give the following state, let's call it State2

S -> a .S U, $
S -> .a S U, $/???

In order to have the lookahead for the second item of State2 I need to calculate FIRST(U$). I know that FIRST(U) = {'b', eps}. My first question is: the lookaheads of the second item of State2 are $ and 'b'? Since U can be eps, my brain tells me that I can have $ as a lookahead as well, not just 'b'. It would have been just 'b' if FIRST(U) would have been just {'b'}. Is that correct?

Second question: at some point I will have a state as the following one

S -> a S .U, $
U -> .b, $
U -> .eps, $

What do I do here? Do I need to move with eps and have a set with the item U -> eps., $? What if I have another terminal as lookahead, i.e. X -> .eps, a/$? And if I move, ending up having a set of the form X -> eps., $, do I reduce?

And more: do I need to insert eps in the parse table as a symbol?

Thanks

回答1:

FIRST(U$) means "the set of symbols which could be first in a derivation of U$". Clearly, if U can derive the empty string, $ must be part of this set. The end-of-input marker $ ensures that we never have to worry about epsilons in the FIRST sets. (If we were doing LR(k) instead of LR(1), we would use k end markers so that all the strings in FIRST_k had length k.
The item associated with U → (or with U → ε if you insist) is U → • . In other words, it is reducible and should trigger a reduce action on matching lookahead.
ε is not a symbol; we only use it (sometimes) to make the empty string visible. But the empty string is empty.

来源：https://stackoverflow.com/questions/50393772/lr1-parsing-table-with-epsilon-productions

标签

parsing

compiler-theory

lr1