How to get line number in ANTLR3 tree-parser @init action

In ANTLR, version 3, how can the line number be obtained in the @init action of a high-level tree-parser rule?

For example, in the @init action below, I'd like to push the line number along with the sentence text.

sentence
    @init { myNodeVisitor.pushScriptContext( new MyScriptContext( $sentence.text )); }
    : assignCommand 
    | actionCommand;
    finally {
        m_nodeVisitor.popScriptContext();
    }

I need to push the context before the execution of the actions associated with symbols in the rules.

Some things that don't work:

Using $sentence.line -- it's not defined, even though $sentence.text is.
Moving the paraphrase push into the rule actions. Placed before the rule, no token in the rule is available. Placed after the rule, the action happens after actions associated with the rule symbols.
Using this expression in the @init action, which compiles but returns the value 0: getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine(). EDIT: Actually, this does work, if $sentence.start is either a real token or an imaginary with a reference -- see Bart Kiers answer below.

It seems like if I can easily get, in the @init rule, the matched text and the first matched token, there should be an easy way to get the line number as well.

You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1), which you can place in the @init section.

Every CommonTree (the default Tree implementation in ANTLR) has a getToken() method which return the Token associated with this tree. And each Token has a getLine() method, which, not surprisingly, returns the line number of this token.

So, if you do the following:

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

you should be able to see some correct line numbers being printed. I say some, because this won't go as planned in all cases. Let me demonstrate using a simple example grammar:

grammar ASTDemo;

options { 
  output=AST;
}

tokens {
  ROOT;
  ACTION;
}

parse
  :  sentence+ EOF -> ^(ROOT sentence+)
  ;

sentence
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  action ID -> ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

ASSIGN : '=';
START  : 'start';
STOP   : 'stop';
ID     : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();};

whose tree grammar looks like:

tree grammar ASTDemoWalker;

options {
  output=AST;
  tokenVocab=ASTDemo;
  ASTLabelType=CommonTree;
}


walk
  :  ^(ROOT sentence+)
  ;

sentence
@init {
  CommonTree ahead = (CommonTree)input.LT(1);
  int line = ahead.getToken().getLine();
  System.out.println("line=" + line);
}
  :  assignCommand 
  |  actionCommand
  ;

assignCommand
  :  ^(ASSIGN ID NUMBER)
  ;

actionCommand
  :  ^(ACTION action ID)
  ;

action
  :  START
  |  STOP
  ;

And if you run the following test class:

import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String src = "\n\n\nABC = 123\n\nstart ABC";
    ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
    ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
    CommonTree root = (CommonTree)parser.parse().getTree();
    ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
    walker.walk();
  }
}

you will see the following being printed:

line=4
line=0

As you can see, "ABC = 123" produced the expected output (line 4), but "start ABC" didn't (line 0). This is because the root of the action rule is a ACTION token and this token is never defined in the lexer, only in the tokens{...} block. And because it doesn't really exist in the input, by default the line 0 is attached to it. If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION token which it uses to copy attributes into itself.

So, if you change the actionCommand rule in the combined grammar into:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start] action ID)
  ;

the line number would be as expected (line 6).

Note that every parser rule has a start and end attribute which is a reference to the first and last token, respectively. If action was a lexer rule (say FOO), then you could have omitted the .start from it:

actionCommand
  :  ref=FOO ID -> ^(ACTION[$ref] action ID)
  ;

Now the ACTION token has copied all attributes from whatever $ref is pointing to, except the type of the token, which is of course int ACTION. But this also means that it copied the text attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID) could look like:

            [text=START,type=ACTION]
                  /         \
                 /           \
                /             \
   [text=START,type=START]  [text=ABC,type=ID]

Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION and START share the same .text attribute.

You can copy all attributes to an imaginary token except the .text and .type by providing a second string parameter, like this:

actionCommand
  :  ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
  ;

And if you now run the same test class again, you will see the following printed:

line=4
line=6

And if you inspect the tree that is generated, it'll look like this:

[type=ROOT, text='ROOT']
  [type=ASSIGN, text='=']
    [type=ID, text='ABC']
    [type=NUMBER, text='123']
  [type=ACTION, text='Action']
    [type=START, text='start']
    [type=ID, text='ABC']

来源：https://stackoverflow.com/questions/8344264/how-to-get-line-number-in-antlr3-tree-parser-init-action

标签

java

antlr

antlr3