问题
How to use lexer rules having same starting?
I am trying to use two similar lexer rules (having the same starting):
TIMECONSTANT: ('0'..'9')+ ':' ('0'..'9')+;
INTEGER : ('0'..'9')+;
COLON : ':';
Here is my sample grammar:
grammar TestTime;
text : (timeexpr | caseblock)*;
timeexpr : TIME;
caseblock : INT COLON ID;
TIME : ('0'..'9')+ ':' ('0'..'9')+;
INT : ('0'..'9')+;
COLON : ':';
ID : ('a'..'z')+;
WS : (' ' | '\t' | '\r' | '\n') {$channel=HIDDEN;};
When i try to parse text:
12:44
123 : abc
123: abc
First two lines are parsed correctly, 3rd - generates error. For some reason, '123:' ANTLR parses as TIME (while it is not)...
So, is it possible to make grammar with such lexems?
Having such rules is necessary in my language for using both case-blocks and datetime constants. For example in my language it is possible to write:
case MyInt of
1: a := 01.01.2012;
2: b := 12:44;
3: ....
end;
回答1:
As soon DIGIT+ ':'
is matched, the lexer expects this to be followed by another DIGIT
to match a TIMECONSTANT
. If this does not happen, it cannot fall back on another lexer rule that matches DIGIT+ ':'
and the lexer will not give up on the already matched ':'
to match an INTEGER
.
A possible solution would be to optionally match ':' DIGIT+
at the end of the INTEGER
rule and change the type of the token if this gets matched:
grammar T;
parse
: (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF
;
INTEGER : DIGIT+ ((':' DIGIT)=> ':' DIGIT+ {$type=TIMECONSTANT;})?;
COLON : ':';
SPACE : ' ' {skip();};
fragment DIGIT : '0'..'9';
fragment TIMECONSTANT : ;
When parsing the input:
11: 12:13 : 14
the following will be printed:
INTEGER '11'
COLON ':'
TIMECONSTANT '12:13'
COLON ':'
INTEGER '14'
EDIT
Not too nice, but works...
True. However, this is not an ANTLR short coming: most lexer generators I know will have a problem properly tokenizing such a TIMECONSTANT
(when INTEGER
and COLON
are also present). ANTLR at least facilitates a way to handle it in the lexer :)
You could also let this be handled by the parser instead of the lexer:
time_const : INTEGER COLON INTEGER;
INTEGER : '0'..'9'+;
COLON : ':';
SPACE : ' ' {skip();};
However, if your language's lexer ignores white spaces, then input like:
12 : 34
would also be match by the time_const
rule, of course.
回答2:
ANTLR lexers can't backtrack, which means once it reaches the ':' in the TIMECONSTANT rule it must complete the rule or an exception will be thrown. You can get your grammar working by using a predicate to test for the presence of a number following the colon.
TIMECONSTANT: ('0'..'9')+ (':' '0'..'9')=> ':' ('0'..'9')+;
INTEGER : ('0'..'9')+;
COLON : ':';
This will force ANTLR to look beyond the colon before it decides that it is in a TIMECONSTANT rule.
来源:https://stackoverflow.com/questions/10029137/antlr-how-to-use-lexer-rules-having-same-starting