How to get line number in ANTLR3 tree-parser @init action

僤鯓⒐⒋嵵緔 提交于 2019-12-05 21:08:16

You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1), which you can place in the @init section.

Every CommonTree (the default Tree implementation in ANTLR) has a getToken() method which return the Token associated with this tree. And each Token has a getLine() method, which, not surprisingly, returns the line number of this token.

So, if you do the following:

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

you should be able to see some correct line numbers being printed. I say some, because this won't go as planned in all cases. Let me demonstrate using a simple example grammar:

grammar ASTDemo;

options { 
  output=AST;
}

tokens {
  ROOT;
  ACTION;
}

parse
  :  sentence+ EOF -> ^(ROOT sentence+)
  ;

sentence
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  action ID -> ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

ASSIGN : '=';
START  : 'start';
STOP   : 'stop';
ID     : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

whose tree grammar looks like:

tree grammar ASTDemoWalker;

options {
  output=AST;
  tokenVocab=ASTDemo;
  ASTLabelType=CommonTree;
}


walk
  :  ^(ROOT sentence+)
  ;

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

And if you run the following test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "\n\n\nABC = 123\n\nstart ABC";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree root = (CommonTree)parser.parse().getTree();
    ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
    walker.walk();
  }
}

you will see the following being printed:

line=4
line=0

As you can see, "ABC = 123" produced the expected output (line 4), but "start ABC" didn't (line 0). This is because the root of the action rule is a ACTION token and this token is never defined in the lexer, only in the tokens{...} block. And because it doesn't really exist in the input, by default the line 0 is attached to it. If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION token which it uses to copy attributes into itself.

So, if you change the actionCommand rule in the combined grammar into:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start] action ID)
  ;

the line number would be as expected (line 6).

Note that every parser rule has a start and end attribute which is a reference to the first and last token, respectively. If action was a lexer rule (say FOO), then you could have omitted the .start from it:

actionCommand
  :  ref=FOO ID -> ^(ACTION[$ref] action ID)
  ;

Now the ACTION token has copied all attributes from whatever $ref is pointing to, except the type of the token, which is of course int ACTION. But this also means that it copied the text attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID) could look like:

            [text=START,type=ACTION]
                  /         \
                 /           \
                /             \
   [text=START,type=START]  [text=ABC,type=ID]

Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION and START share the same .text attribute.

You can copy all attributes to an imaginary token except the .text and .type by providing a second string parameter, like this:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
  ;

And if you now run the same test class again, you will see the following printed:

line=4
line=6

And if you inspect the tree that is generated, it'll look like this:

[type=ROOT, text='ROOT']
  [type=ASSIGN, text='=']
    [type=ID, text='ABC']
    [type=NUMBER, text='123']
  [type=ACTION, text='Action']
    [type=START, text='start']
    [type=ID, text='ABC']
标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!