Currenlty I\'m working on my own grammar and I would like to have specific error messages on NoViableAlternative
, InputMismatch
, UnwantedToke
After some research I came up with an another solution. In the book "The Definitive ANTLR4 Reference" in Chapter 9.4 they explain how to use error alternatives:
fcall
: ID '(' expr ')'
| ID '(' expr ')' ')' {notifyErrorListeners("Too many parentheses");}
| ID '(' expr {notifyErrorListeners("Missing closing ')'");}
;
These error alternatives can make an ANTLR-generated parser work a little harder to choose between alternatives, but they don't in any way confuse the parser.
I adapted this to my grammar and extended the BaseErrorListener
. The passed Exception to the notifyErrorListener
are null (from Parser.class
):
public final void notifyErrorListeners(String msg) {
this.notifyErrorListeners(this.getCurrentToken(), msg, (RecognitionException)null);
}
So handled it in the extension of BaseErrorListener
, like that:
if (recognitionException instanceof LexerNoViableAltException) {
message = handleLexerNoViableAltException((Lexer) recognizer);
} else if (recognitionException instanceof InputMismatchException) {
message = handleInputMismatchException((CommonToken) offendingSymbol);
} else if (recognitionException instanceof NoViableAltException) {
message = handleNoViableAltException((CommonToken) offendingSymbol);
} else if (Objects.isNull(recognitionException)) {
// Handle Errors specified in my grammar
message = msg;
} else {
message = "Can't be resolved";
}
I hope that helps a little bit
My strategy for improving the ANTLR4 error messages is a bit different. I use a syntaxError
override in my error listeners (I have one for both the lexer and the parser). By using the given values and a few other stuff like the LL1Analyzer you can create pretty precise error messages. The lexer error listener's handling is pretty straight forward (hopefully C++ code is understandable for you):
void LexerErrorListener::syntaxError(Recognizer *recognizer, Token *, size_t line,
size_t charPositionInLine, const std::string &, std::exception_ptr ep) {
// The passed in string is the ANTLR generated error message which we want to improve here.
// The token reference is always null in a lexer error.
std::string message;
try {
std::rethrow_exception(ep);
} catch (LexerNoViableAltException &) {
Lexer *lexer = dynamic_cast<Lexer *>(recognizer);
CharStream *input = lexer->getInputStream();
std::string text = lexer->getErrorDisplay(input->getText(misc::Interval(lexer->tokenStartCharIndex, input->index())));
if (text.empty())
text = " "; // Should never happen.
switch (text[0]) {
case '/':
message = "Unfinished multiline comment";
break;
case '"':
message = "Unfinished double quoted string literal";
break;
case '\'':
message = "Unfinished single quoted string literal";
break;
case '`':
message = "Unfinished back tick quoted string literal";
break;
default:
// Hex or bin string?
if (text.size() > 1 && text[1] == '\'' && (text[0] == 'x' || text[0] == 'b')) {
message = std::string("Unfinished ") + (text[0] == 'x' ? "hex" : "binary") + " string literal";
break;
}
// Something else the lexer couldn't make sense of (likely there is no rule that accepts this input).
message = "\"" + text + "\" is no valid input at all";
break;
}
owner->addError(message, 0, lexer->tokenStartCharIndex, line, charPositionInLine,
input->index() - lexer->tokenStartCharIndex);
}
}
This code shows that we don't use the original message at all and instead examine the token text to see what's wrong. Here we mostly deal with unclosed strings:
The parser error listener is much more complicated and too large to post here. It's a combination of different sources to construct the actual error message:
Parser.getExpectedTokens()
: uses the LL1Analyzer to get the next possible lexer tokens from a given position in the ATN (the socalled follow-set). It looks through predicates however, which might be a problem (if you use such).
Identifiers & keywords: often certain keywords are allowed as normal identifiers in specific situations, which creates follow-sets with a list of keywords that are actually meant to be identifiers, so that needs an extra check to avoid showing them as expected values:
Parser rule invocation stack, during the call to the error listener the parser has the current parser rule context (Parser.getRuleContext()
) which you can use to walk up the invocation stack, to find rule contexts that give you more specific information of the error location (for example, walking up from a *
match to a hypothetical expr
rule tells you that actually an expression is expected at this point).
The given exception: if this is null the error is about a missing or unwanted single token, which is pretty easy to handle. If the exception has a value you can examine it for further details. Worth mentioning here is that the content of the exception is not used (and pretty sparse anyway), instead we use the values that were collected previously. The most common exception types are NoViableAlt
and InputMismatch
, which you can both translate to either "input is incomplete" when the error position is EOF or something like "input is not valid at this position". Both can then be enhanced with an expectation constructed from the rule invocation stack and/or the follow-set as mentioned (and shown in the image) above.