问题
How can the negation meta-character, ~
, be used in ANTLR's lexer- and parser rules?
回答1:
Negating can occur inside lexer and parser rules.
Inside lexer rules you can negate characters, and inside parser rules you can negate tokens (lexer rules). But both lexer- and parser rules can only negate either single characters, or single tokens, respectively.
A couple of examples:
lexer rules
To match one or more characters except lowercase ascii letters, you can do:
NO_LOWERCASE : ~('a'..'z')+ ;
(the negation-meta-char, ~
, has a higher precedence than the +
, so the rule above equals (~('a'..'z'))+
)
Note that 'a'..'z'
matches a single character (and can therefor be negated), but the following rule is invalid:
ANY_EXCEPT_AB : ~('ab') ;
Because 'ab'
(obviously) matches 2 characters, it cannot be negated. To match a token that consists of 2 character, but not 'ab'
, you'd have to do the following:
ANY_EXCEPT_AB
: 'a' ~'b' // any two chars starting with 'a' followed by any other than 'b'
| ~'a' . // other than 'a' followed by any char
;
parser rules
Inside parser rules, ~
negates a certain token, or more than one token. For example, you have the following tokens defined:
A : 'A';
B : 'B';
C : 'C';
D : 'D';
E : 'E';
If you now want to match any token except the A
, you do:
p : ~A ;
And if you want to match any token except B
and D
, you can do:
p : ~(B | D) ;
However, if you want to match any two tokens other than A
followed by B
, you cannot do:
p : ~(A B) ;
Just as with lexer rules, you cannot negate more than a single token. To accomplish the above, you need to do:
P
: A ~B
| ~A .
;
Note that the .
(DOT) char in a parser rules does not match any character as it does inside lexer rules. Inside parser rules, it matches any token (A
, B
, C
, D
or E
, in this case).
Note that you cannot negate parser rules. The following is illegal:
p : ~a ;
a : A ;
来源:https://stackoverflow.com/questions/8284919/negating-inside-lexer-and-parser-rules