ANTLR parsing MismatchedTokenException

后端 未结 2 785
无人共我
无人共我 2021-01-25 00:42

I\'m trying to write a simple parser for an even simpler language that I\'m writing. It\'s composed of postfix expressions. As of now, I\'m having issues with the parser. When I

2条回答
  •  一个人的身影
    2021-01-25 01:40

    A couple of things are not correct:

    1

    You've put the WS token on the HIDDEN channel, which makes them unavailable to parser rules. So all WS tokens inside your body rule are incorrect.

    2

    _(your latest edit removed the left-recursion issue, but I'll still make a point of it sorry, your other question has a left recursive rule (expr), so I'll leave this info in here)_

    ANTLR is an LL parser-generator, so you can'r create left-recursive grammars. The following is left recursive:

    expr
      :  term term operator
      ;
    
    term
      :  INT
      |  ID
      |  expr
      ;
    

    because the first term inside the expr rule could possible match an expr rule itself. Like any LL parser, ANTLR generated parser cannot cope with left recursion.

    3

    If you fix the WS issue, your body rule will produce the following error message:

    (1/7) Decision can match input such as "INT" using multiple alternatives

    This means that the parser cannot "see" to which rule the INT token belongs. This is due to the fact that all your body alternative can be repeated zero or more times and expr and nested are also repeated. And all of them can match an INT, which is what ANTLR is complaining about. If you remove the *'s like this:

    body
        :   nested
        |   var
        |   get
        ;
    
    // ...
    
    expr
        :   term (term operator)
        ;
    
    nested
        :   expr (expr operator)
        ;
    

    the errors would disappear (although that would still not cause your input to be parsed properly!).

    I realize that this might still sound vague, but it's not trivial to explain (or comprehend if you're new to all this).

    4

    To properly account for recursive expr inside expr, you'll need to stay clear of left recursion as I explained in #2. You can do that like this:

    expr
      :  term (expr operator | term operator)*
      ;
    

    which is still ambiguous, but that is in case of describing a postfix expression using an LL grammar, unavoidable AFAIK. To resolve this, you could enable global backtracking inside the options { ... } section of the grammar:

    options {
      language=Python;
      output=AST;
      backtrack=true;
    }
    

    Demo

    A little demo of how to parse recursive expressions could look like:

    grammar star;
    
    options {
      language=Python;
      output=AST;
      backtrack=true;
    }
    
    parse
      :  expr EOF -> expr
      ;
    
    expr
      :  (term -> term) ( expr2 operator -> ^(operator $expr expr2) 
                        | term operator  -> ^(operator term term)
                        )*
      ;
    
    expr2 
      :  expr
      ;
    
    term
      :  INT
      |  ID
      ;
    
    operator 
      :  ('*' | '+' | '/' | '%' | '-')
      ;
    
    ID
      :  ('a'..'z' | 'A'..'Z') ('a..z' | '0'..'9' | 'A'..'Z')*
      ;
    
    INT
      :  '0'..'9'+
      ;
    
    WS
      :  (' ' | '\n' | '\t' | '\r') {$channel=HIDDEN;}
      ;
    

    The test script:

    #!/usr/bin/env python
    import antlr3
    from antlr3 import *
    from antlr3.tree import *
    from starLexer import *
    from starParser import *
    
    def print_level_order(tree, indent):
      print '{0}{1}'.format('   '*indent, tree.text)
      for child in tree.getChildren():
        print_level_order(child, indent+1)
    
    input = "5 1 2 + 4 * + 3 -"
    char_stream = antlr3.ANTLRStringStream(input)
    lexer = starLexer(char_stream)
    tokens = antlr3.CommonTokenStream(lexer)
    parser = starParser(tokens)
    tree = parser.parse().tree 
    print_level_order(tree, 0)
    

    produces the following output:

    -
       +
          5
          *
             +
                1
                2
             4
       3

    which corresponds to the following AST:

    enter image description here

提交回复
热议问题