问题
I have an ANTLR 4 grammar:
grammar Test;
start : NonZeroDigit '.' Digit Digit? EOF
;
DOT : '.' ;
PLUS : '+' ;
MINUS : '-' ;
COLON : ':' ;
COMMA : ',' ;
QUOTE : '\"' ;
EQUALS : '=' ;
SEMICOLON : ';' ;
UNDERLINE : '_' ;
BACKSLASH : '\\' ;
SINGLEQUOTE : '\'' ;
RESULT_TYPE_NONE : 'NONE' ;
RESULT_TYPE_RESULT : 'RESULT' ;
RESULT_TYPE_RESULT_SET : 'RESULT_SET' ;
TYPE_INT : 'Int' ;
TYPE_LONG : 'Long' ;
TYPE_BOOL : 'Bool' ;
TYPE_DATE : 'Date' ;
TYPE_DOUBLE : 'Double' ;
TYPE_STRING : 'String' ;
TYPE_INT_LIST : 'List<Int>' ;
TYPE_LONG_LIST : 'List<Long>' ;
TYPE_BOOL_LIST : 'List<Bool>' ;
TYPE_DATE_LIST : 'List<Date>' ;
TYPE_DOUBLE_LIST : 'List<Double>' ;
TYPE_STRING_LIST : 'List<String>' ;
LONG_END : 'L' ;
DOUBLE_END : 'd' ;
DATE_NOW : 'NOW' ;
BOOL_TRUE : 'true' ;
BOOL_FALSE : 'false' ;
BLOCK_OPEN : '{' ;
BLOCK_CLOSE : '}' ;
GENERIC_OPEN : '<' ;
GENERIC_CLOSE : '>' ;
BRACKET_OPEN : '(' ;
BRACKET_CLOSE : ')' ;
MAP : 'Map' ;
LIST : 'List' ;
GROUP : 'Group' ;
BY : 'by' ;
DEFAULT : 'default' ;
JSON_NAME : 'JSONName' ;
INTERFACE : 'interface' ;
CLASS : 'class' ;
ABSTRACT : 'abstract' ;
IMPLEMENTS : 'implements' ;
EXTENDS : 'extends' ;
CACHEABLE : 'cacheable' ;
FUNCTION : 'function' ;
REQUEST : 'request' ;
NAMED_QUERY : 'namedQuery' ;
INPUT : 'input' ;
OUTPUT : 'output' ;
RESULT_TYPE : 'resultType' ;
PACKAGE : 'package' ;
SCHEMA : 'schema' ;
VERSION : 'version' ;
MIN_VERSION : 'minVersion' ;
fragment
NonZeroDigit : [1-9]
;
fragment
Digit : '0' | NonZeroDigit
;
fragment
Digits : Digit+
;
fragment
IntegerNumber : '0' | ( NonZeroDigit Digits? )
;
fragment
SignedIntegerNumber : ( '+' | '-' )? IntegerNumber
;
fragment
FloatingNumber : IntegerNumber ( '.' Digits )?
;
fragment
SignedFloatingNumber : ( '+' | '-' )? FloatingNumber
;
fragment
Letter : [a-z]
;
fragment
Letters : Letter+
;
fragment
CapitalLetter : [A-Z]
;
fragment
CapitalLetters : CapitalLetter+
;
fragment
LetterOrDigitOrUnderline : Letter | CapitalLetter | Digit | '_'
;
fragment
EscapeSequence : ( '\\' ( 'b' | 't' | 'n' | 'f' | 'r' | '\"' | '\'' | '\\' ) )
| UnicodeEscape
| OctalEscape
;
fragment
HexDigit : [0-9] | [a-f] | [A-F]
;
fragment
UnicodeEscape : '\\' 'u' HexDigit HexDigit HexDigit HexDigit
;
fragment
OctalEscape : ( '\\' [0-3] [0-7] [0-7] )
| ( '\\' [0-7] [0-7] )
| ( '\\' [0-7] )
;
WS : [ \t\r\n]+ -> skip
;
I'm using it like this:
final ByteArrayInputStream input = new ByteArrayInputStream("1.11".getBytes());
final TestLexer lexer = new TestLexer(new ANTLRInputStream(input));
final TestParser parser = new TestParser(new CommonTokenStream(lexer));
parser.start();
But this gives me:
line 1:0 token recognition error at: '1'
line 1:2 token recognition error at: '1'
line 1:3 token recognition error at: '1'
line 1:1 missing NonZeroDigit at '.'
line 1:4 missing Digit at '<EOF>'
What am I doing wrong? I'm using antlr v4.1.
Thanks in advance for helping.
回答1:
fragment
lexer rules can only be used by other lexer rules: these will never become a token on their own. Therefor, you cannot use fragment
rules in parser rules.
回答2:
The fragment
is not the root cause.
First, try to reproduce your errors:
When compiling your Test.g4, it will appear warnings below:
warning(156): Test.g4:11:21: invalid escape sequence \"
warning(156): Test.g4:123:59: invalid escape sequence \"
warning(146): Test.g4:11:0: non-fragment lexer rule QUOTE can match the empty string
warning(125): Test.g4:3:8: implicit definition of token NonZeroDigit in parser
warning(125): Test.g4:3:25: implicit definition of token Digit in parser
After removing unused rules:
grammar Test;
start : NonZeroDigit '.' Digit Digit? EOF
;
fragment
NonZeroDigit : [1-9]
;
fragment
Digit : '0' | NonZeroDigit
;
Then compile it again and test it:
warning(125): Test.g4:3:8: implicit definition of token NonZeroDigit in parser
warning(125): Test.g4:3:25: implicit definition of token Digit in parser
line 1:0 token recognition error at: '1'
line 1:2 token recognition error at: '1'
line 1:3 token recognition error at: '1'
line 1:1 missing NonZeroDigit at '.'
line 1:4 missing Digit at '<EOF>'
(start <missing NonZeroDigit> . <missing Digit> <EOF>)
(try to reproduce your errors)
When applying 'fragment'
When applying 'fragment' on NonZeroDigit
and Digit
, the g4 will be equivalent to :
replace NonZeroDigit
with [1-9]
grammar Test;
start : [1-9] '.' Digit Digit? EOF
;
fragment
Digit : '0' | [1-9]
;
replace Digit
with ('0' | [1-9])
grammar Test;
start : [1-9] '.' ('0' | [1-9]) ('0' | [1-9])? EOF
;
but the parser rule start
(the identifier starts with a lowercase alphabet) cannot be all letters.
Refer to The Definitive ANTLR 4 Reference
Page73
lexer rule names with uppercase letters and parser rule names with lowercase letters. For example, ID is a lexical rule name, and expr is a parser rule name.
After removing 'fragment'
After removing 'fragment' from g4, there is still an unexpected error.
line 1:3 extraneous input '3' expecting {<EOF>, Digit}
(start 1 . 0 3 <EOF>)
Error study:
for NonZeroDigit
:
if naming as nonZeroDigit, we will get:
syntax error: '1-9' came as a complete surprise to me while matching alternative
Because [1-9]
is a letter (constant token). We need to name it with an uppercase prefix. (=lexer rule)
for Digit
:
it containing an identifier NonZeroDigit
, so we need to name it with a lowercase prefix. (=parser rule)
The correct Test.g4 should be:
grammar Test;
start : NonZeroDigit '.' digit digit? EOF
;
NonZeroDigit : [1-9]
;
digit : '0' | NonZeroDigit
;
If you want to use fragment
, you should create a lexer rule Number
because the rule ONLY consists of letters (constant tokens). And the identifier should start with an uppercase prefix, start
is not
grammar Test;
start : Number EOF
;
Number : NonZeroDigit '.' Digit Digit?
;
fragment
NonZeroDigit : [1-9]
;
fragment
Digit : '0' | NonZeroDigit
;
来源:https://stackoverflow.com/questions/21369606/token-recognition-error-antlr