Error recovery in an LALR(1) grammar

一世执手 提交于 2019-12-10 11:23:41

问题


I'm using some parser and lexer generating tools (similar to Lex and Bison, but for C#) to generate programs that parse strings into abstract syntax trees that can later be evaluated.

I wanted to do error recovery (i.e. report in the produced abstract sentence tree that there are missing tokens and such). I had two approaches in mind to structuring the generated grammars, and I was wondering which approach was better/more flexible/wouldn't have conflicts (the .y and .lex files are generated based on a description of the calculator).

The calculator description allows the user to specify terminals/regex's and their placement for operators and the associativity. So something like:

grammar.AddTerminal("Plus", "\\+").
    AddNonTerminal(new NonTerminal("Add", Associativity.LeftToRight).
        AddTerminal("Expression").
        AddTerminal("Plus").
        AddTerminal("Expression"));

(Precedence is specified via the order that the Terminal's and NonTerminal's are added. "Add" is the name of a method that is discovered via Reflection. Basically it tells the NonTerminal what to call the operator in the abstract syntax tree.)


Approach 1: (allow the empty rule for any expression)

S -> E
E -> E + T
E -> T
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a
P -> @ [error]

a is a terminal. @ is empty.


Approach 2: (only allow the empty rule for the start rule)

S -> E
S -> @ [error]
E -> + [error]
E -> T + [error]
E -> + T [error]
E -> E + T
E -> T
T -> * [error]
T -> * P [error]
T -> P * [error]
T -> T * P
T -> P
P -> (E)
P -> (E [error]
P -> a

Here's an example to show a left-most derivation for a bad-input using each approach.


Input: (a +


Approach 1:

S
E
T
P
(E
(E + T
(T + T
(P + T
(a + T
(a + P
(a +

Approach 2:

S
E
T
P
(E
(T +
(P +
(a +

Approach 2 is much harder to code for (consider subtraction/unary negative operator. You can't just look at subtract A -> A - B, take out that first A and report an error on A -> - B because that's valid for the unary operator.) I coded for approach 2 this morning only to find out that I think it has grammar issues and that an empty rule as in Approach 1 makes things much simpler, code-wise, but my main concern is which approach would produce the least amount of grammar issues as programmers create calculator descriptions as described above.


回答1:


This question has been around for a while and covers a topic that is is often visited by beginners in the subject. One often find that those who have done a compilers course in their undergraduate degree know that this is one of those questions that has no easy or single answer. You might have noticed that you have two questions on the same topic, neither of which has been answered. Another question someone else posted was answered with pointers to the literature that explains why this is hard.

It is a question that has remained extant for over 50 years. If one examines the literature over time, from early conference papers, course textbooks, doctoral thesis and (today) SO, we can see regular references to the fact that this is the wrong question! (Or rather the wrong approach to the problem).

Just taking a sample from course texts over the years (random selections from my bookshelf):

Gries, D. (1970) Error Recovery and Correction - An introduction to the Literature, In Compiler Construction, An advanced Course edited by Bauer, F.L. & Eickel, J., Springer Verlag, pp.627-638.
Gries, D. (1971) Compiler Construction for Digital Computers, Wiley, pp.320-326.
Aho, A.V., Ullman, J.D. (1977) Principles of Compiler Design, Addison Wesley, pp.397-405.
Bornat, R. (1979) Understanding and Writing Compilers, Macmillan, pp.251-252.
Hanson, D. (1995) A retargetable C compiler: Design and Implementation,Addison-Wesley, pp.140-146.
Grune, D., Bal, H.E., Jacobs, C.J.H. & Langendoen, K.G. (2000) Modern Compiler Design, Wiley, pp.175-184.
Aho, A.V., Lam, M.S., Sethi, R., Ullman, J.D. (2007) Compilers: Principles, Techniques, & Tools, Pearson, Addison-Wesley, pp.283-296.

All of these concur (over a 40 year span) that your question is about using the wrong tools wrongly or going in the wrong direction. I think I'm trying to say "You can't there from here". You should start from somewhere else.

If you want something deeper, there is a whole Phd thesis:

Charles, P. (1991) A Practical method for Constructing Efficient LALR(k) Parsers with Automatic Error Recovery, New York University

Hopefully, for those who visit this question again in the future, there is a placeholder for the answers.


I note from comments that you are using MPPG, derived from CPPG. Not everyone will have used these, so I'm including a couple of links to these tools:

Managed Babel Systems Essentials
Garden Points Parser Generator
Irony .NET compiler Construction Kit
Writing your first Visual Studio Language Service



来源:https://stackoverflow.com/questions/11870125/error-recovery-in-an-lalr1-grammar

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!