问题
I have the following grammar:
rule: 'aaa' | 'a' 'a';
It can successfully parse the string 'aaa', but it fails to parse 'aa' with the following error:
line 1:2 mismatched character '<EOF>' expecting 'a'
FYI, it is the lexer's problem not the parser's because I don't even call the parser. The main function looks like:
@members {
public static void main(String[] args) throws Exception {
RecipeLexer lexer = new RecipeLexer(new ANTLRInputStream(System.in));
for (Token t = lexer.nextToken(); t.getType() != EOF; t = lexer.nextToken())
System.out.println(t.getType());
}
}
The result is the same with the more obvious version:
rule: AAA | A A;
AAA: 'aaa';
A: 'a';
Obviously the ANTLR lexer tries to match the input 'aa' with the rule AAA which fails. Apart from that ANTLR is an LL(*) parser or whatever, the lexer should work separately from the parser and it should be able to resolve ambiguity. The grammar works fine with the good old lex(or flex) but it doesn't seem with ANTLR. So what is the problem here?
Thanks for the help!
回答1:
ANTLR's generated parsers are (or can be) LL(*), not its lexers.
When the lexer sees the input "aa"
, it tries to match token AAA
. When it fails to do so, it tries to match any other token that also matches "aa"
(the lexer does not backtrack to match A
!). Since this is not possible, an error is produced.
This is usually not a problem, since in practice, there's often some sort of identifier rule "aa"
can fall back to. So, what actual problem are you trying solve, or were you only curious of the inner workings? If it's the first, please edit your question and describe your actual problem.
来源:https://stackoverflow.com/questions/12190501/antlr-lexer-cant-lookahead-at-all