问题
I am using ANTLR to create an and/or parser+evaluator. Expressions will have the format like:
x eq 1 && y eq 10
(x lt 10 && x gt 1) OR x eq -1
I was reading this post on logic expressions in ANTLR Looking for advice on project. Parsing logical expression and I found the grammar posted there a good start:
grammar Logic;
parse
: expression EOF
;
expression
: implication
;
implication
: or ('->' or)*
;
or
: and ('&&' and)*
;
and
: not ('||' not)*
;
not
: '~' atom
| atom
;
atom
: ID
| '(' expression ')'
;
ID : ('a'..'z' | 'A'..'Z')+;
Space : (' ' | '\t' | '\r' | '\n')+ {$channel=HIDDEN;};
However, while getting a tree from the parser works for expressions where the variables are just one character (ie, "(A || B) AND C"
, I am having a hard time adapting this to my case (in the example "x eq 1 && y eq 10"
I'd expect one "AND"
parent and two children, "x eq 1"
and "y eq 10"
, see the test case below).
@Test
public void simpleAndEvaluation() throws RecognitionException{
String src = "1 eq 1 && B";
LogicLexer lexer = new LogicLexer(new ANTLRStringStream(src));
LogicParser parser = new LogicParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
assertEquals("&&",tree.getText());
assertEquals("1 eq 1",tree.getChild(0).getText());
assertEquals("a neq a",tree.getChild(1).getText());
}
I believe this is related with the "ID"
. What would the correct syntax be?
回答1:
For those interested, I made some improvements in my grammar file (see bellow)
Current limitations:
only works with &&/||, not AND/OR (not very problematic)
you can't have spaces between the parenthesis and the &&/|| (I solve that by replacing " (" with ")" and ") " with ")" in the source String before feeding the lexer)
grammar Logic;
options { output = AST; } tokens { AND = '&&'; OR = '||'; NOT = '~'; } // parser/production rules start with a lower case letter parse : expression EOF! // omit the EOF token ; expression : or ; or : and (OR^ and)* // make `||` the root ; and : not (AND^ not)* // make `&&` the root ; not : NOT^ atom // make `~` the root | atom ; atom : ID | '('! expression ')'! // omit both `(` and `)` ; // lexer/terminal rules start with an upper case letter ID : ( 'a'..'z' | 'A'..'Z' | '0'..'9' | ' ' | SYMBOL )+ ; SYMBOL : ('+'|'-'|'*'|'/'|'_') ;
回答2:
ID : ('a'..'z' | 'A'..'Z')+;
states that an identifier is a sequence of one or more letters, but does not allow any digits. Try
ID : ('a'..'z' | 'A'..'Z' | '0'..'9')+;
which will allow e.g. abc
, 123
, 12ab
, and ab12
. If you don't want the latter types, you'll have to restructure the rule a little bit (left as a challenge...)
In order to accept arbitrarily many identifiers, you could define atom
as ID+
instead of ID
.
Also, you will likely need to specify AND
, OR
, ->
and ~
as tokens so that, as @Bart Kiers says, the first two won't get classified as ID
, and so that the latter two will get recognized at all.
来源:https://stackoverflow.com/questions/9509048/antlr-parser-for-and-or-logic-how-to-get-expressions-between-logic-operators