What is a 'semantic predicate' in ANTLR?

后端 未结 2 1922
死守一世寂寞
死守一世寂寞 2020-11-22 08:21

What is a semantic predicate in ANTLR?

相关标签:
2条回答
  • 2020-11-22 08:47

    I've always used the terse reference to ANTLR predicates on wincent.com as my guide.

    0 讨论(0)
  • 2020-11-22 08:49

    ANTLR 4

    For predicates in ANTLR 4, checkout these stackoverflow Q&A's:

    • Syntax of semantic predicates in Antlr4
    • Semantic predicates in ANTLR4?

    ANTLR 3

    A semantic predicate is a way to enforce extra (semantic) rules upon grammar actions using plain code.

    There are 3 types of semantic predicates:

    • validating semantic predicates;
    • gated semantic predicates;
    • disambiguating semantic predicates.

    Example grammar

    Let's say you have a block of text consisting of only numbers separated by comma's, ignoring any white spaces. You would like to parse this input making sure that the numbers are at most 3 digits "long" (at most 999). The following grammar (Numbers.g) would do such a thing:

    grammar Numbers;
    
    // entry point of this parser: it parses an input string consisting of at least 
    // one number, optionally followed by zero or more comma's and numbers
    parse
      :  number (',' number)* EOF
      ;
    
    // matches a number that is between 1 and 3 digits long
    number
      :  Digit Digit Digit
      |  Digit Digit
      |  Digit
      ;
    
    // matches a single digit
    Digit
      :  '0'..'9'
      ;
    
    // ignore spaces
    WhiteSpace
      :  (' ' | '\t' | '\r' | '\n') {skip();}
      ;
    

    Testing

    The grammar can be tested with the following class:

    import org.antlr.runtime.*;
    
    public class Main {
        public static void main(String[] args) throws Exception {
            ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");
            NumbersLexer lexer = new NumbersLexer(in);
            CommonTokenStream tokens = new CommonTokenStream(lexer);
            NumbersParser parser = new NumbersParser(tokens);
            parser.parse();
        }
    }
    

    Test it by generating the lexer and parser, compiling all .java files and running the Main class:

    java -cp antlr-3.2.jar org.antlr.Tool Numbers.g
    javac -cp antlr-3.2.jar *.java
    java -cp .:antlr-3.2.jar Main
    

    When doing so, nothing is printed to the console, which indicates that nothing went wrong. Try changing:

    ANTLRStringStream in = new ANTLRStringStream("123, 456, 7   , 89");
    

    into:

    ANTLRStringStream in = new ANTLRStringStream("123, 456, 7777   , 89");
    

    and do the test again: you will see an error appearing on the console right after the string 777.


    Semantic Predicates

    This brings us to the semantic predicates. Let's say you want to parse numbers between 1 and 10 digits long. A rule like:

    number
      :  Digit Digit Digit Digit Digit Digit Digit Digit Digit Digit
      |  Digit Digit Digit Digit Digit Digit Digit Digit Digit
         /* ... */
      |  Digit Digit Digit
      |  Digit Digit
      |  Digit
      ;
    

    would become cumbersome. Semantic predicates can help simplify this type of rule.


    1. Validating Semantic Predicates

    A validating semantic predicate is nothing more than a block of code followed by a question mark:

    RULE { /* a boolean expression in here */ }?
    

    To solve the problem above using a validating semantic predicate, change the number rule in the grammar into:

    number
    @init { int N = 0; }
      :  (Digit { N++; } )+ { N <= 10 }?
      ;
    

    The parts { int N = 0; } and { N++; } are plain Java statements of which the first is initialized when the parser "enters" the number rule. The actual predicate is: { N <= 10 }?, which causes the parser to throw a FailedPredicateException whenever a number is more than 10 digits long.

    Test it by using the following ANTLRStringStream:

    // all equal or less than 10 digits
    ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 
    

    which produces no exception, while the following does thow an exception:

    // '12345678901' is more than 10 digits
    ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
    

    2. Gated Semantic Predicates

    A gated semantic predicate is similar to a validating semantic predicate, only the gated version produces a syntax error instead of a FailedPredicateException.

    The syntax of a gated semantic predicate is:

    { /* a boolean expression in here */ }?=> RULE
    

    To instead solve the above problem using gated predicates to match numbers up to 10 digits long you would write:

    number
    @init { int N = 1; }
      :  ( { N <= 10 }?=> Digit { N++; } )+
      ;
    

    Test it again with both:

    // all equal or less than 10 digits
    ANTLRStringStream in = new ANTLRStringStream("1,23,1234567890"); 
    

    and:

    // '12345678901' is more than 10 digits
    ANTLRStringStream in = new ANTLRStringStream("1,23,12345678901");
    

    and you will see the last on will throw an error.


    3. Disambiguating Semantic Predicates

    The final type of predicate is a disambiguating semantic predicate, which looks a bit like a validating predicate ({boolean-expression}?), but acts more like a gated semantic predicate (no exception is thrown when the boolean expression evaluates to false). You can use it at the start of a rule to check some property of a rule and let the parser match said rule or not.

    Let's say the example grammar creates Number tokens (a lexer rule instead of a parser rule) that will match numbers in the range of 0..999. Now in the parser, you'd like to make a distinction between low- and hight numbers (low: 0..500, high: 501..999). This could be done using a disambiguating semantic predicate where you inspect the token next in the stream (input.LT(1)) to check if it's either low or high.

    A demo:

    grammar Numbers;
    
    parse
      :  atom (',' atom)* EOF
      ;
    
    atom
      :  low  {System.out.println("low  = " + $low.text);}
      |  high {System.out.println("high = " + $high.text);}
      ;
    
    low
      :  {Integer.valueOf(input.LT(1).getText()) <= 500}? Number
      ;
    
    high
      :  Number
      ;
    
    Number
      :  Digit Digit Digit
      |  Digit Digit
      |  Digit
      ;
    
    fragment Digit
      :  '0'..'9'
      ;
    
    WhiteSpace
      :  (' ' | '\t' | '\r' | '\n') {skip();}
      ;
    

    If you now parse the string "123, 999, 456, 700, 89, 0", you'd see the following output:

    low  = 123
    high = 999
    low  = 456
    high = 700
    low  = 89
    low  = 0
    
    0 讨论(0)
提交回复
热议问题