问题
This question is about how different kinds of ANTLR4 lexer errors are reported to custom lexer and parser error listeners. From my experiments, it seems that not all ANTLR4 errors are reported internally to both the default and custom lexer/parser error listeners. Some ANTLR4 errors are only reported on the console. Consequently, there is no way for me to detect such errors programmatically.
EXAMPLES
I have two working grammars (one simple, one complex) and a common ANTLR4 setup in Visual Studio (C#). In all examples below, the only things I change are 1) the bad input string and 2) which grammar I use. All the other code (the unit test, the helper class that registers the error listeners, and the custom error lister class) remain exactly the same, untouched. Everything regenerates, compiles, and executes as expected (except for the issue in this post).
I have a working custom error listener that I add to both the lexer and parser error listener list as shown below. I do not remove the default (console output) listeners because I want to know when any error is triggered.
// add my custom ErrorListener (same object) into the lexer and parser error listener lists
helloLexer.AddErrorListener(MyErrorListener.Instance);
Parser.AddErrorListener(MyErrorListener.Instance);
When I give some bad input to the first simple grammar, I get error messages from both the default lexer listener and my custom lexer listener as shown below. From this, I conclude that my custom lister works and has been installed properly into the common code.
Using Hello grammar and bad input.
Bad input: "h9ell%[o world"
Both the custom and default lexer error listeners were invoked.
My custom LEXER error listener printed the first three errors.
My custom PARSER error listener printed no errors.
The default error listener printed the other four errors.
A lexer or parser error occurred.
Custom error listener errors:
Line 1, 0-offset1: token recognition error at: '9' '9ell%[o world'
Line 1, 0-offset5: token recognition error at: '%' '%[o world'
Line 1, 0-offset6: token recognition error at: '[' '[o world'
Default error listener console errors:
line 1:1 token recognition error at: '9'
line 1:0 mismatched input 'h' expecting 'hello'
line 1:5 token recognition error at: '%'
line 1:6 token recognition error at: '['
Notice the type of lexer errors being successfully reported to the default and custom listeners. Both of them report token recognition
errors, but only the default lexer error listener reports the mismatched input
error.
Now, using the second (more complex) working grammar with bad input, I generate a different type of error. I don't know if it is a lexer error, a parser error, or some special kind of error. But the following output shows that the default error listener gets the message, but neither of my custom lexer nor parser error listeners is called.
Using SendKeys grammar and a bad input string.
Bad input: "h9ell%[o world"
The custom error listener was not invoked (breakpoint not hit).
The same custom error listener instance was registered for both lexer and parser.
Default error listener console errors:
line 1:14 extraneous input '<EOF>' expecting {'(', ']', LCOMMA, LCARET,...}
The output above indicates that the calling code is not walking down the list of installed lexer/parser error listeners to call them all. Instead, I guess that the default error handler is somehow being called directly?
Notice the type of (lexer? parser?) error being reported by the default listener: "extraneous input
."
Some further experiments showed that neither of the following errors are passed to my custom lexer or parser listeners:
- extraneous input <EOF>
- no viable alternative input
- mismatched input (from the first example)
- there may also be others that I don't know about
SendKeys grammar bad input "h9ell%[o world"
line 1:14 extraneous input '<EOF>' expecting {'(', ']', LCOMMA,
SendKeys grammar bad input "h9ell%)o world"
line 1:6 no viable alternative at input '%)'
SendKeys grammar bad input "alt-ctl-shf-c"
line 1:18 no viable alternative at input 'alt-ctl-'
Q1. How can I receive error notifications for these kinds of errors programmatically if my working, registered lexer and parser error listeners are not called when the errors occur?
Q2. If it is not possible to receive notifications via the registered error listeners (since they are never called), is there some other way that I can detect bad input strings programmatically during the parse?
Thank you
UPDATE
As requested, here is complete code for the simple grammar, unit test, the parser helper class, and the custom error listener. This should be enough to reproduce the first set of error messages shown in this post.
I am using the following Nuget packages in VStudio 16.7.5:
Antlr4 4.6.6
Antlr4.Code.Generator 4.6.6
Antlr4.Runtime 4.6.6
grammar Hello;
// match one or more tokenpairs
toprule : (tokenpair)+ EOF ;
// a tokenpair has one child rule and one terminal token ID
tokenpair : helloworld ID ;
// this rule has two terminal tokens
helloworld : 'hello' 'world' ;
// ID is upper or lower characters and underscore
ID : [a-zA-Z_]+ ;
// skip all whitespace characters
WS : [ \t\r\n]+ -> skip ;
[TestMethod()]
public void MyParserHelperTest() {
MyParserHelper.ParseText("h9ell%[o world");
//MyParserHelper.ParseText("ctl-alt-shf-a alt-ctl-shf-c");
if (MyParserHelper.ParseOk) {
Dprint($"No custom lexer or parse errors were received.\n");
}
else {
MyParserHelper.PrintErrors();
Dprint($"\nDefault error listener console errors:");
}
}
public static void PrintErrors() {
Console.WriteLine("Custom listener lexer or parser errors:");
foreach (var errstring in ParseErrors) {
Console.WriteLine(errstring);
}
}
public static class MyParserHelper
{
public static bool ParseOk = true;
public static List<string> ParseErrors = new List<string>();
public static HelloParser Parser;
public static HelloParser ParseText(string text) {
try {
var inputStream = new AntlrInputStream(text);
var helloLexer = new HelloLexer(inputStream);
var commonTokenStream = new CommonTokenStream(helloLexer);
var helloParser = new HelloParser(commonTokenStream);
Parser = helloParser;
// install the custom ErrorListener into the lexer and parser
//helloLexer.RemoveErrorListeners();
helloLexer.AddErrorListener(MyErrorListener.Instance);
//Parser.RemoveErrorListeners();
Parser.AddErrorListener(MyErrorListener.Instance);
var ctx = Parser.toprule();
if (ParseErrors.Count > 0)
Console.WriteLine("A lexer or parser error occurred.");
}
catch (Exception ex) {
// each time a parse error occurs, add it to the error list
ParseOk = false;
ParseErrors.Add(ex.Message);
}
return Parser;
}
public class MyErrorListener : BaseErrorListener, IAntlrErrorListener<int>
{
public static readonly MyErrorListener Instance = new MyErrorListener();
public void SyntaxError(IRecognizer recognizer, int offendingSymbol, int line,
int offset, string msg, RecognitionException e) {
// build an error message from the input stream
// the stream could be a whole file, so only print what you need
var input = recognizer.InputStream.ToString();
var maxLength = Math.Min(20, input.Length - offset);
var badtoken = input.Substring(offset, maxLength);
var errmsg = "Line " + line + ", 0-offset" + offset + ": " + msg + $" '{badtoken}'";
MyParserHelper.ParseErrors.Add(errmsg);
MyParserHelper.ParseOk = false;
}
}
To get Visual Studio to automatically recompile the grammar file and generate the lexer/parser files, set the grammar file properties in VS to those shown in the image below.
来源:https://stackoverflow.com/questions/64689420/antlr4-errors-not-being-reported-to-custom-lexer-parser-error-listeners