ANTLR trying to match token within longer token

问题

I'm new to ANTLR, and trying following grammar in ANTLRWorks1.4.3.

command
:   'go' SPACE+ 'to' SPACE+ destination
;

destination
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' '
;

UPPER
:   'A'..'Z'
;

LOWER
:   'a'..'z'
;

DIGIT
:   '0'..'9'
;

This seems to work OK, except when the 'destination' contains first two chars of keywords 'go' and 'to'. For instance, if I give following command:

go to Glasgo

the node-tree is displayed as follows:

I was expecting it to match fill word as destination.

I even tried changing the keyword, for example 'travel' instead of 'go'. In that case, if there is 'tr' in the destination, ANTLR complains.

Any idea why this happens? and how to fix this?

Thanks in advance.

回答1:

ANTLR lexer and parser are strictly separated. Your input is first tokenized, after which the parser rules operate on said tokens.

In you case, the input go to Glasgo is tokenized into the following X tokens:

'go'
' ' (SPACE)
'to'
'G' (UPPER)
'l' (LOWER)
'a' (LOWER)
's' (LOWER)
'go'

which leaves a "dangling" 'go' keyword. This is simply how ANTLR's lexer works: you cannot change this.

A possible solution in your case would be to make destination a lexer rule instead of a parser rule:

command
:   'go' 'to' DESTINATION
;

DESTINATION
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' ' {skip();}
;

fragment UPPER
:   'A'..'Z'
;

fragment LOWER
:   'a'..'z'
;

fragment DIGIT
:   '0'..'9'
;

resulting in:

If you're not entirely sure what the difference between the two is, see: Practical difference between parser rules and lexer rules in ANTLR?

More about fragments: What does "fragment" mean in ANTLR?

PS. Glasgow?

来源：https://stackoverflow.com/questions/11902168/antlr-trying-to-match-token-within-longer-token

标签

antlr3