I am working on my compilers concepts however I am a little confused... Googling got me nowhere to a definite answer.
Is SLR and LR(0) parsers one and same? If not, what
Both LR(0) and SLR(1) parsers are bottom-up, directional, predictive parsers. This means that
Both LR(0) and SLR(1) are shift/reduce parsers, meaning that they process the tokens of the input stream by placing them on a stack, and at each point either shifting a token by pushing it onto the stack or reducing some sequence of terminals and nonterminals atop the stack back to some nonterminal symbol. It can be shown that any grammar can be parsed bottom-up using a shift/reduce parser, but that parser might not be deterministic. That is, the parser may have to "guess" whether to apply a shift or reduction, and may end up having to backtrack to realize that it made the wrong choice. No matter how powerful a deterministic shift/reduce parser you construct, it will never be able to parse all grammars.
When a deterministic shift/reduce parser is used to parse a grammar that it cannot handle, it results in shift/reduce conflicts or reduce/reduce conflicts, where the parser may enter a state in which it cannot tell what action to take. In a shift/reduce conflict, it cannot tell whether it should add another symbol to the stack or perform some reduction on the top symbols of the stack. In a reduce/reduce conflict, the parser knows that it needs to replace the top symbols of the stack with some nonterminal, but it can't tell what reduction to use.
I apologize if this is a lengthy exposition, but we need this to be able to address the difference between LR(0) and SLR(1) parsing. An LR(0) parser is a shift/reduce parser that uses zero tokens of lookahead to determine what action to take (hence the 0). This means that in any configuration of the parser, the parser must have an unambiguous action to choose - either it shifts a specific symbol or applies a specific reduction. If there are ever two or more choices to make, the parser fails and we say that the grammar is not LR(0).
Recall that the two possible LR conflicts are shift/reduce and reduce/reduce. In both of these cases, there are at least two actions that the LR(0) automaton could be taking, and it can't tell which of them to use. Since at least one of the conflicting actions is a reduction, a reasonable line of attack would be to try to have the parser be more careful about when it performs a particular reduction. More specifically, let's suppose that the parser is allowed to look at the next token of input to determine whether it should shift or reduce. If we only allow the parser to reduce when it "makes sense" to do so (for some definition of "makes sense"), then we may be able to eliminate the conflict by having the automaton specifically choose to either shift or reduce in a particular step.
In SLR(1) ("Simplified LR(1)"), the parser is allowed to look at one token of lookahead when deciding whether it should shift or reduce. In particular, when the parser wants to try reducing something of the form A → w (for nonterminal A and string w), it looks at the next token of input. If that token could legally appear after the nonterminal A in some derivation, the parser reduces. Otherwise, it does not. The intuition here is that in some cases it makes no sense to attempt a reduction, because given the tokens we've seen so far and the upcoming token, there is no possible way that the reduction could ever be correct.
The only difference between LR(0) and SLR(1) is this extra ability to help decide what action to take when there are conflicts. Because of this, any grammar that can be parsed by an LR(0) parser can be parsed by an SLR(1) parser. However, SLR(1) parsers can parse a larger number of grammars than LR(0).
In practice, though, SLR(1) is still a fairly weak parsing method. More commonly, you will see LALR(1) ("Lookahead LR(1)") parsers being used. They too work by trying to resolve conflicts in an LR(0) parser, but the rules they use for resolving conflicts are far more precise than those used in SLR(1), and consequently a much larger number of grammars are LALR(1) than are SLR(1). To be a bit more specific, SLR(1) parsers try to resolve conflicts by looking at the structure of the grammar to learn more information about when to shift and when to reduce. LALR(1) parsers look at both the grammar and the LR(0) parser to get even more specific information about when to shift and when to reduce. Because LALR(1) can look at the structure of the LR(0) parser, it can more precisely identify when certain conflicts are spurious. The Linux utilities yacc
and bison
, by default, produce LALR(1) parsers.
Historically, LALR(1) parsers were typically constructed through a different method that relied on the far more powerful LR(1) parser, so you will often see LALR(1) described that way. To understand this, we need to talk about LR(1) parsers. In an LR(0) parser, the parser works by keeping track of where it might be in the middle of a production. Once it has found that it's reached the end of a production, it knows to try to reduce. However, the parser might not be able to tell whether it's in at the end of one production and the middle of another, which leads to a shift/reduce conflict, or which of two different productions it has reached the end of (a reduce/reduce conflict). In LR(0), this immediately leads to a conflict and the parser fails. In SLR(1) or LALR(1), the parser then makes the decision to shift or reduce based on the next token of lookahead.
In an LR(1) parser, the parser keeps track of additional information as it operates. In addition to keeping track of what production the parser believes is being used, it keeps track of what possible tokens might appear after that production is completed. Because the parser keeps track of this information at each step, and not just when it needs to make the decision, the LR(1) parser is substantially more powerful and precise than any of the LR(0), SLR(1), or LALR(1) parsers we've talked about so far. LR(1) is an extremely powerful parsing technique, and it can be shown using some tricky math that any language that could be parsed deterministically by any shift/reduce parser has some grammar that could be parsed with an LR(1) automaton. (Note that this does not mean that all grammars that can be parsed deterministically are LR(1); this only says that a language that could be parsed deterministically has some LR(1) grammar). However, this power comes at a price, and a generated LR(1) parser may require so much information to operate that it can't possibly be used in practice. An LR(1) parser for a real programming language, for example, might require tens to hundreds of megabytes of additional information to operate correctly. For this reason, LR(1) isn't typically used in practice, and weaker parsers like LALR(1) or SLR(1) are used instead.
More recently, a new parsing algorithm called GLR(0) ("Generalized LR(0)") has gained popularity. Rather than trying to resolve the conflicts that appear in an LR(0) parser, the GLR(0) parser instead works by trying all possible options in parallel. Using some clever tricks, this can be made to run very efficiently for many grammars. Moreover, GLR(0) can parse any context-free grammar at all, even grammars that can't be parsed by an LR(k) parser for any k. Other parsers are capable of doing this as well (for example, the Earley parser or a CYK parser), though GLR(0) tends to be faster in practice.
If you're interested in learning more, over this summer I taught an introductory compilers course and spent just under two weeks talking about parsing techniques. If you'd like to get a more rigorous introduction to LR(0), SLR(1), and a host of other powerful parsing techniques, you might enjoy my lecture slides and homework assignments about parsing. All of the course materials are available here on my personal site.
Hope this helps!