How can I modify the text of tokens in a CommonTokenStream with ANTLR?

喜你入骨 提交于 2019-12-03 08:08:19

ANTLR has a way to do this in it's grammar file.

Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this:

grammar Foo;

parse
  :  value ( ',' value )* EOF
  ;

value
  :  Number
  |  String
  ;

String
  :  '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  ( ' ' | '\t' ) {skip();}
  ;

This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that:

grammar Foo;

options {output=template; rewrite=true;} 

parse
  :  value ( ',' value )* EOF
  ;

value
  :  n=Number -> template(num={$n.text}) "[<num>]" 
  |  String
  ;

String
  :  '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
  ;

Number
  :  '0'..'9'+
  ;

Space
  :  ( ' ' | '\t' ) {skip();}
  ;

As you see, I've added some options at the top, and added a rewrite rule (everything after the ->) after the Number in the value parser rule.

Now to test it all, compile and run this class:

import org.antlr.runtime.*;

public class FooTest {
  public static void main(String[] args) throws Exception {
    String text = "12, \"34\", 56, \"a\\\"b\", 78";
    System.out.println("parsing: "+text);
    ANTLRStringStream in = new ANTLRStringStream(text);
    FooLexer lexer = new FooLexer(in);
    CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream!
    FooParser parser = new FooParser(tokens);
    parser.parse();
    System.out.println("tokens: "+tokens.toString());
  }
}

which produces:

parsing: 12, "34", 56, "a\"b", 78
tokens: [12],"34",[56],"a\"b",[78]

In ANTLR 4 there is a new facility using parse tree listeners and TokenStreamRewriter (note the name difference) that can be used to observe or transform trees. (The replies suggesting TokenRewriteStream apply to ANTLR 3 and will not work with ANTLR 4.)

In ANTL4 an XXXBaseListener class is generated for you with callbacks for entering and exiting each non-terminal node in the grammar (e.g. enterClassDeclaration() ).

You can use the Listener in two ways:

1) As an observer - By simply overriding the methods to produce arbitrary output related to the input text - e.g. override enterClassDeclaration() and output a line for each class declared in your program.

2) As a transformer using TokenRewriteStream to modify the original text as it passes through. To do this you use the rewriter to make modifications (add, delete, replace) tokens in the callback methods and you use the rewriter and the end to output the modified text.

See the following examples from the ANTL4 book for an example of how to do transformations:

https://github.com/mquinn/ANTLR4/blob/master/book_code/tour/InsertSerialIDListener.java

and

https://github.com/mquinn/ANTLR4/blob/master/book_code/tour/InsertSerialID.java

The other given example of changing the text in the lexer works well if you want to globally replace the text in all situations, however you often only want to replace a token's text during certain situations.

Using the TokenRewriteStream allows you the flexibility of changing the text only during certain contexts.

This can be done using a subclass of the token stream class you were using. Instead of using the CommonTokenStream class you can use the TokenRewriteStream.

So you'd have the TokenRewriteStream consume the lexer and then you'd run your parser.

In your grammar typically you'd do the replacement like this:

/** Convert "int foo() {...}" into "float foo();" */
function
:
{
    RefTokenWithIndex t(LT(1));  // copy the location of the token you want to replace
    engine.replace(t, "float");
}
type id:ID LPAREN (formalParameter (COMMA formalParameter)*)? RPAREN
    block[true]
;

Here we've replaced the token int that we matched with the text float. The location information is preserved but the text it "matches" has been changed.

To check your token stream after you would use the same code as before.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!