ANTLR trying to match token within longer token

泄露秘密 提交于 2019-12-08 06:45:23

问题


I'm new to ANTLR, and trying following grammar in ANTLRWorks1.4.3.

command
:   'go' SPACE+ 'to' SPACE+ destination
;

destination
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' '
;

UPPER
:   'A'..'Z'
;

LOWER
:   'a'..'z'
;

DIGIT
:   '0'..'9'
;

This seems to work OK, except when the 'destination' contains first two chars of keywords 'go' and 'to'. For instance, if I give following command:

go to Glasgo

the node-tree is displayed as follows:

I was expecting it to match fill word as destination.

I even tried changing the keyword, for example 'travel' instead of 'go'. In that case, if there is 'tr' in the destination, ANTLR complains.

Any idea why this happens? and how to fix this?

Thanks in advance.


回答1:


ANTLR lexer and parser are strictly separated. Your input is first tokenized, after which the parser rules operate on said tokens.

In you case, the input go to Glasgo is tokenized into the following X tokens:

  1. 'go'
  2. ' ' (SPACE)
  3. 'to'
  4. 'G' (UPPER)
  5. 'l' (LOWER)
  6. 'a' (LOWER)
  7. 's' (LOWER)
  8. 'go'

which leaves a "dangling" 'go' keyword. This is simply how ANTLR's lexer works: you cannot change this.

A possible solution in your case would be to make destination a lexer rule instead of a parser rule:

command
:   'go' 'to' DESTINATION
;

DESTINATION
:   (UPPER | LOWER) (UPPER | LOWER | DIGIT)*
;

SPACE
:   ' ' {skip();}
;

fragment UPPER
:   'A'..'Z'
;

fragment LOWER
:   'a'..'z'
;

fragment DIGIT
:   '0'..'9'
;

resulting in:


If you're not entirely sure what the difference between the two is, see: Practical difference between parser rules and lexer rules in ANTLR?

More about fragments: What does "fragment" mean in ANTLR?


PS. Glasgow?



来源:https://stackoverflow.com/questions/11902168/antlr-trying-to-match-token-within-longer-token

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!