In ANTLR, version 3, how can the line number be obtained in the @init action of a high-level tree-parser rule?
For example, in the @init action below, I'd like to push the line number along with the sentence text.
sentence
@init { myNodeVisitor.pushScriptContext( new MyScriptContext( $sentence.text )); }
: assignCommand
| actionCommand;
finally {
m_nodeVisitor.popScriptContext();
}
I need to push the context before the execution of the actions associated with symbols in the rules.
Some things that don't work:
- Using
$sentence.line
-- it's not defined, even though$sentence.text
is. - Moving the paraphrase push into the rule actions. Placed before the rule, no token in the rule is available. Placed after the rule, the action happens after actions associated with the rule symbols.
- Using this expression in the @init action, which compiles but returns the value 0:
getTreeNodeStream().getTreeAdaptor().getToken( $sentence.start ).getLine()
. EDIT: Actually, this does work, if $sentence.start is either a real token or an imaginary with a reference -- see Bart Kiers answer below.
It seems like if I can easily get, in the @init rule, the matched text and the first matched token, there should be an easy way to get the line number as well.
You can look 1 step ahead in the token/tree-stream of a tree grammar using the following: CommonTree ahead = (CommonTree)input.LT(1)
, which you can place in the @init
section.
Every CommonTree
(the default Tree
implementation in ANTLR) has a getToken()
method which return the Token
associated with this tree. And each Token
has a getLine()
method, which, not surprisingly, returns the line number of this token.
So, if you do the following:
sentence
@init {
CommonTree ahead = (CommonTree)input.LT(1);
int line = ahead.getToken().getLine();
System.out.println("line=" + line);
}
: assignCommand
| actionCommand
;
you should be able to see some correct line numbers being printed. I say some, because this won't go as planned in all cases. Let me demonstrate using a simple example grammar:
grammar ASTDemo;
options {
output=AST;
}
tokens {
ROOT;
ACTION;
}
parse
: sentence+ EOF -> ^(ROOT sentence+)
;
sentence
: assignCommand
| actionCommand
;
assignCommand
: ID ASSIGN NUMBER -> ^(ASSIGN ID NUMBER)
;
actionCommand
: action ID -> ^(ACTION action ID)
;
action
: START
| STOP
;
ASSIGN : '=';
START : 'start';
STOP : 'stop';
ID : ('a'..'z' | 'A'..'Z')+;
NUMBER : '0'..'9'+;
SPACE : (' ' | '\t' | '\r' | '\n')+ {skip();};
whose tree grammar looks like:
tree grammar ASTDemoWalker;
options {
output=AST;
tokenVocab=ASTDemo;
ASTLabelType=CommonTree;
}
walk
: ^(ROOT sentence+)
;
sentence
@init {
CommonTree ahead = (CommonTree)input.LT(1);
int line = ahead.getToken().getLine();
System.out.println("line=" + line);
}
: assignCommand
| actionCommand
;
assignCommand
: ^(ASSIGN ID NUMBER)
;
actionCommand
: ^(ACTION action ID)
;
action
: START
| STOP
;
And if you run the following test class:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
public class Main {
public static void main(String[] args) throws Exception {
String src = "\n\n\nABC = 123\n\nstart ABC";
ASTDemoLexer lexer = new ASTDemoLexer(new ANTLRStringStream(src));
ASTDemoParser parser = new ASTDemoParser(new CommonTokenStream(lexer));
CommonTree root = (CommonTree)parser.parse().getTree();
ASTDemoWalker walker = new ASTDemoWalker(new CommonTreeNodeStream(root));
walker.walk();
}
}
you will see the following being printed:
line=4
line=0
As you can see, "ABC = 123"
produced the expected output (line 4), but "start ABC"
didn't (line 0). This is because the root of the action
rule is a ACTION
token and this token is never defined in the lexer, only in the tokens{...}
block. And because it doesn't really exist in the input, by default the line 0 is attached to it. If you want to change the line number, you need to provide a "reference" token as a parameter to this so called imaginary ACTION
token which it uses to copy attributes into itself.
So, if you change the actionCommand
rule in the combined grammar into:
actionCommand
: ref=action ID -> ^(ACTION[$ref.start] action ID)
;
the line number would be as expected (line 6).
Note that every parser rule has a start
and end
attribute which is a reference to the first and last token, respectively. If action
was a lexer rule (say FOO
), then you could have omitted the .start
from it:
actionCommand
: ref=FOO ID -> ^(ACTION[$ref] action ID)
;
Now the ACTION
token has copied all attributes from whatever $ref
is pointing to, except the type of the token, which is of course int ACTION
. But this also means that it copied the text
attribute, so in my example, the AST created by ref=action ID -> ^(ACTION[$ref.start] action ID)
could look like:
[text=START,type=ACTION]
/ \
/ \
/ \
[text=START,type=START] [text=ABC,type=ID]
Of course, it's a proper AST because the types of the nodes are unique, but it makes debugging confusing since ACTION
and START
share the same .text
attribute.
You can copy all attributes to an imaginary token except the .text
and .type
by providing a second string parameter, like this:
actionCommand
: ref=action ID -> ^(ACTION[$ref.start, "Action"] action ID)
;
And if you now run the same test class again, you will see the following printed:
line=4
line=6
And if you inspect the tree that is generated, it'll look like this:
[type=ROOT, text='ROOT']
[type=ASSIGN, text='=']
[type=ID, text='ABC']
[type=NUMBER, text='123']
[type=ACTION, text='Action']
[type=START, text='start']
[type=ID, text='ABC']
来源:https://stackoverflow.com/questions/8344264/how-to-get-line-number-in-antlr3-tree-parser-init-action