How to stop ANTLR from suppressing syntax errors?

旧城冷巷雨未停 提交于 2019-12-06 09:24:18

... [O]nce it's done parsing I want to get an exception or some kind of indication that the input wasn't syntactically valid; that way I can stop the compilation...

You can call getNumberOfSyntaxErrors on both the lexer and the parser after parsing to determine if there was an error that was covertly accommodated by ANTLR. This doesn't tell you what those errors were, obviously, but I think these methods address the "once it's done parsing ... stop the compilation" part of your question.

The Definitive ANTLR Reference offers a technique to stop parsing as soon as a syntax error is detected: override the mismatch and recoverFromMismatchedSet methods to throw RecognitionExceptions, and add a @rulecatch action to do the same.

I don't think you mentioned which version of ANTLR you're using, but the documentation in the ANTLR v3.4 code for the method recoverFromMismatchedSet says it's "not currently used" and an Eclipse "global usage" scan found no callers. Neither here nor there to your main problem, but I wanted to mention it for the record. It may be the correct method to override for your version.

If a necessary token is missing ..., [the overridden code] throws an exception just as expected, but if an extraneous token is added, ANTLR inserts the token that it thinks belongs there and continues on its merry way...

Method recoverFromMismatchedToken tests for a recoverable missing and extraneous token by delegating to methods mismatchIsMissingToken and mismatchIsUnwantedToken respectively. If the appropriate method determines that an insertion or deletion will solve the problem, recoverFromMismatchedToken makes the appropriate correction. If it is determined that no operation solves the mismatched token problem, recoverFromMismatchedToken throws a MismatchedTokenException.

If a recovery operation takes place, reportError is called, which calls displayRecognitionError with the details.

This applies to ANTLR v3.4 and possibly earlier versions.

This gives you at least two options:

  • Override recoverFromMismatchedToken and handle errors at a fine-grained level. From here you can delegate the call to the super implementation, roll your own recovery code, or bail out with an exception. Whatever the case, your code will be called and thus will be aware that a mismatch error occurred, recoverable or otherwise. This option is probably equivalent to overriding recoverFromMismatchedSet.

  • Override displayRecognitionError and handle the errors at a course-grained level. Method reportError does some state juggling, so I wouldn't recommend overriding it unless the overriding implementation calls the super-implementation. Method displayRecognitionError appears to be one of the last calls in the recovered-token call chain, so it would be a reasonable place to determine whether or not to continue. I would prefer it had a name that indicated that it was a reasonable place for that, but oh well. Here is an answer that demonstrates this option.

I'm partial towards overriding displayRecognitionError because it provides the error message text easily enough and because I know it's going to be called only after a token recovery operation and required state juggling -- no need for my parser to figure out how to recover for itself. This coupled with getNumberOfSyntaxErrors appear to give you the options that you're looking for, assuming that you're working with a relevant version of ANTLR and that I fully understood your problem.
