Antlr parser for and/or logic - how to get expressions between logic operators?

南笙酒味 提交于 2019-12-23 03:55:58

问题


I am using ANTLR to create an and/or parser+evaluator. Expressions will have the format like:

  • x eq 1 && y eq 10
  • (x lt 10 && x gt 1) OR x eq -1

I was reading this post on logic expressions in ANTLR Looking for advice on project. Parsing logical expression and I found the grammar posted there a good start:

grammar Logic;

parse
  :  expression EOF
  ;

expression
  :  implication
  ;

implication
  :  or ('->' or)*
  ;

or
  :  and ('&&' and)*
  ;

and
  :  not ('||' not)*
  ;

not
  :  '~' atom
  |  atom
  ;

atom
  :  ID
  |  '(' expression ')'
  ;

ID    : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};

However, while getting a tree from the parser works for expressions where the variables are just one character (ie, "(A || B) AND C", I am having a hard time adapting this to my case (in the example "x eq 1 && y eq 10" I'd expect one "AND" parent and two children, "x eq 1" and "y eq 10", see the test case below).

@Test
public void simpleAndEvaluation() throws RecognitionException{
    String src = "1 eq 1 && B";

    LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
    LogicParser parser = new LogicParser(new CommonTokenStream(lexer));


    CommonTree tree = (CommonTree)parser.parse().getTree();

    assertEquals("&&",tree.getText());
    assertEquals("1 eq 1",tree.getChild(0).getText());
    assertEquals("a neq a",tree.getChild(1).getText());
}

I believe this is related with the "ID". What would the correct syntax be?


回答1:


For those interested, I made some improvements in my grammar file (see bellow)

Current limitations:

  • only works with &&/||, not AND/OR (not very problematic)

  • you can't have spaces between the parenthesis and the &&/|| (I solve that by replacing " (" with ")" and ") " with ")" in the source String before feeding the lexer)

    grammar Logic;

    options {
      output = AST;
    }
    
    tokens {
      AND = '&&';
      OR  = '||';
      NOT = '~';
    }
    
    // parser/production rules start with a lower case letter
    parse
      :  expression EOF!    // omit the EOF token
      ;
    
    expression
      :  or
      ;
    
    or
      :  and (OR^ and)*    // make `||` the root
      ;
    
    and
      :  not (AND^ not)*      // make `&&` the root
      ;
    
    not
      :  NOT^ atom    // make `~` the root
      |  atom
      ;
    
    atom
      :  ID
      |  '('! expression ')'!    // omit both `(` and `)`
      ;
    
    // lexer/terminal rules start with an upper case letter
    ID
      :
        (
        'a'..'z'
        | 'A'..'Z'
        | '0'..'9' | ' '
        | SYMBOL
      )+ 
      ;
    
    SYMBOL
      :
        ('+'|'-'|'*'|'/'|'_')
     ;
    



回答2:


ID    : ('a'..'z' | 'A'..'Z')+;

states that an identifier is a sequence of one or more letters, but does not allow any digits. Try

ID    : ('a'..'z' | 'A'..'Z' | '0'..'9')+;

which will allow e.g. abc, 123, 12ab, and ab12. If you don't want the latter types, you'll have to restructure the rule a little bit (left as a challenge...)

In order to accept arbitrarily many identifiers, you could define atom as ID+ instead of ID.

Also, you will likely need to specify AND, OR, -> and ~ as tokens so that, as @Bart Kiers says, the first two won't get classified as ID, and so that the latter two will get recognized at all.



来源:https://stackoverflow.com/questions/9509048/antlr-parser-for-and-or-logic-how-to-get-expressions-between-logic-operators

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!