问题
I have a combined grammar where I need to provide for two identifier lexer rules. Both identifiers can be used at the same time. Identifier1 comes before Identifer2 in grammar.
First identifier is static, whereas second identifier rule changes on the basis of some flag.(Using predicate).
I want the second identifier to match in parser rules. But as both identifiers may match some common inputs, It does not fall on identifer2.
I have created small grammar to make it understandable. Grammar is as:
@lexer::members
{
private boolean flag;
public void setFlag(boolean flag)
{
this.flag = flag;
}
}
identifier1 :
ID1
;
identifier2 :
ID2
;
ID1 : (CHARS) *;
ID2 : (CHARS | ({flag}? '_'))* ;
fragment CHARS
:
('a' .. 'z')
;
If I try to match identifer2 rule as :
ANTLRStringStream in = new ANTLRStringStream("abcabde");
IdTestLexer lexer = new IdTestLexer(in);
lexer.setFlag(true);
CommonTokenStream tokens = new CommonTokenStream(lexer);
IdTestParser parser = new IdTestParser(tokens);
parser.identifier2();
It shows error: line 1:0 missing ID2 at 'abcabde'
回答1:
ID1 : (CHARS) *;
ID2 : (CHARS | ({flag}? '_'))* ;
For ANTLR these two rules mean:
- If the input is just characters, it's
ID1
- If the input mixes characters and
_
andflag == true
, it'sID2
Note that if flag == false
, ID2
will never be matched.
The two basic rules the Lexer follows are:
- It matches the token that covers the longest sub-sequence of input
- If multiple tokens can match the same input, use the one that comes first in the grammar
I believe your core issue is misunderstanding the difference between lexer and parser and their usage. The question you should ask yourself is: When should 'abcabde' be matched as ID1
and when as ID2
?
- Always
ID1
- then your grammar is correct as it is now. - Always
ID2
- then you should switch the two rules - but note that in such caseID1
will never be matched. - It depends on
flag
- then you need to modify the predicate according to your logic, just toggling the underscore isn't enough. - It depends on where in the input the identifier is used - then this is not something that lexer can decide, and you need to tell the two kinds of identifiers apart in parser rather than lexer. Formally, lexer uses regular language while you need context-free language to decide about the identifiers like that.
来源:https://stackoverflow.com/questions/51359630/antlr3-grammar-does-not-match-rule-with-predicate