ANTLR 4.5 - Mismatched Input 'x' expecting 'x'

前端 未结 2 1321
隐瞒了意图╮
隐瞒了意图╮ 2020-12-01 01:10

I have been starting to use ANTLR and have noticed that it is pretty fickle with its lexer rules. An extremely frustrating example is the following:

grammar          


        
相关标签:
2条回答
  • 2020-12-01 01:30

    This was not directly OP's problem, but for those who have the same error message, here is something you could check.


    I had the same Mismatched Input 'x' expecting 'x' vague error message when I introduced a new keyword. The reason for me was that I had placed the new key word after my VARNAME lexer rule, which assigned it as a variable name instead of as the new keyword. I fixed it by putting the keywords before the VARNAME rule.

    0 讨论(0)
  • 2020-12-01 01:40

    This seems to be a common misunderstanding of ANTLR:

    Language Processing in ANTLR:

    The Language Processing is done in two strictly separated phases:

    • Lexing, i.e. partitioning the text into tokens
    • Parsing, i.e. building a parse tree from the tokens

    Since lexing must preceed parsing there is a consequence: The lexer is independent of the parser, the parser cannot influence lexing.

    Lexing

    Lexing in ANTLR works as following:

    • all rules with uppercase first character are lexer rules
    • the lexer starts at the beginning and tries to find a rule that matches best to the current input
    • a best match is a match that has maximum length, i.e. the token that results from appending the next input character to the maximum length match is not matched by any lexer rule
    • tokens are generated from matches:
      • if one rule matches the maximum length match the corresponding token is pushed into the token stream
      • if multiple rules match the maximum length match the first defined token in the grammar is pushed to the token stream

    Example: What is wrong with your grammar

    Your grammar has two rules that are critical:

    FILEPATH: ('A'..'Z'|'a'..'z'|'0'..'9'|':'|'\\'|'/'|' '|'-'|'_'|'.')+ ;
    TITLE: ('A'..'Z'|'a'..'z'|' ')+ ;
    

    Each match, that is matched by TITLE will also be matched by FILEPATH. And FILEPATH is defined before TITLE: So each token that you expect to be a title would be a FILEPATH.

    There are two hints for that:

    • keep your lexer rules disjunct (no token should match a superset of another).
    • if your tokens intentionally match the same strings, then put them into the right order (in your case this will be sufficient).
    • if you need a parser driven lexer you have to change to another parser generator: PEG-Parsers or GLR-Parsers will do that (but of course this can produce other problems).
    0 讨论(0)
提交回复
热议问题