Generated Antlr Parser in Java: Not all inputs are read

时光总嘲笑我的痴心妄想 提交于 2019-12-12 00:21:14

问题


I am working on my Antlr grammar to parse polynomial functions in multiple variables using Java. Examples for legal input are

42; X; +42X; Y^42; 1337HelloWorld; 13,37X^42; 

The following grammar does compile without warnings or errors:

grammar Function;

parseFunction returns [java.util.List<java.util.List<Object>> list] :   
    { list = new java.util.ArrayList(); }                                              ( f=functionPart { list.add($f.list); } )+
|   { list = new java.util.ArrayList(); } ( fb=functionBegin ) { list.add($fb.list); } ( f=functionPart { list.add($f.list); } )*
;

functionBegin returns [java.util.List<Object> list]:
m=NUMBER v=VARIABLE e=exponent  { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); list.add($e.value); }
| m=NUMBER v=VARIABLE           { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); list.add($v.text); }
| v=VARIABLE e=exponent         { list = new java.util.ArrayList(); list.add("+"); list.add("1");     list.add($v.text); list.add($e.value); }  
| v=VARIABLE                    { list = new java.util.ArrayList(); list.add("+"); list.add("1");     list.add($v.text); }
| m=NUMBER                      { list = new java.util.ArrayList(); list.add("+"); list.add($m.text); }
;

functionPart returns [java.util.List<Object> list] :    
s=SIGN m=NUMBER v=VARIABLE e=exponent   { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); list.add($e.value); }
| s=SIGN m=NUMBER v=VARIABLE            { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); list.add($v.text); }
| s=SIGN v=VARIABLE e=exponent          { list = new java.util.ArrayList(); list.add($s.text); list.add("1");     list.add($v.text); list.add($e.value); }
| s=SIGN v=VARIABLE                     { list = new java.util.ArrayList(); list.add($s.text); list.add("1");     list.add($v.text); }
| s=SIGN m=NUMBER                       { list = new java.util.ArrayList(); list.add($s.text); list.add($m.text); }
;

exponent returns [int value]: ('^' n=INTEGER) { $value = 1; if ( $n != null && $n.text.length() > 0) $value = Integer.parseInt($n.text); }
;

VARIABLE    : ('a'..'z'|'A'..'Z')+
;

INTEGER : ('0'..'9')+
;

NUMBER  : ('0'..'9')+ (','('0'..'9')+)?
;

SIGN    :   ('+'|'-')
;

WS  :    (' ' | '\t' | '\r'| '\n')+ {skip();} 
;

This grammar, if compiled and used in Java does accept most input values. Apparently, not all valid input values are accepted. As soon as a number not using a comma pops up, like the inputs

+42; 42; 42X^1337; 

an error is thrown (error from input "+42"):

line 1:1 no viable alternative at input '+'

The error is not thrown if I modify the inputs to

+42,0; 42,0; 42,0X^1337

Can anyone say, why and how to fix it?


回答1:


The first lexer rule with the longest match wins, thus 42 is an INTEGER, and NUMBER in fact only matches when the comma part is present, i.e. when NUMBER has a longer match than INTEGER.

This can be fixed by adding a parser rule

number : NUMBER | INTEGER ;

and using that instead of NUMBER from other parser rules.



来源:https://stackoverflow.com/questions/11736681/generated-antlr-parser-in-java-not-all-inputs-are-read

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!