I\'m trying to learn ANTLR and at the same time use it for a current project.
I\'ve gotten to the point where I can run the lexer on a chunk of code and output it to a C
ANTLR has a way to do this in it's grammar file.
Let's say you're parsing a string consisting of numbers and strings delimited by comma's. A grammar would look like this:
grammar Foo;
parse
: value ( ',' value )* EOF
;
value
: Number
| String
;
String
: '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
;
Number
: '0'..'9'+
;
Space
: ( ' ' | '\t' ) {skip();}
;
This should all look familiar to you. Let's say you want to wrap square brackets around all integer values. Here's how to do that:
grammar Foo;
options {output=template; rewrite=true;}
parse
: value ( ',' value )* EOF
;
value
: n=Number -> template(num={$n.text}) "[]"
| String
;
String
: '"' ( ~( '"' | '\\' ) | '\\\\' | '\\"' )* '"'
;
Number
: '0'..'9'+
;
Space
: ( ' ' | '\t' ) {skip();}
;
As you see, I've added some options
at the top, and added a rewrite rule (everything after the ->
) after the Number
in the value
parser rule.
Now to test it all, compile and run this class:
import org.antlr.runtime.*;
public class FooTest {
public static void main(String[] args) throws Exception {
String text = "12, \"34\", 56, \"a\\\"b\", 78";
System.out.println("parsing: "+text);
ANTLRStringStream in = new ANTLRStringStream(text);
FooLexer lexer = new FooLexer(in);
CommonTokenStream tokens = new TokenRewriteStream(lexer); // Note: a TokenRewriteStream!
FooParser parser = new FooParser(tokens);
parser.parse();
System.out.println("tokens: "+tokens.toString());
}
}
which produces:
parsing: 12, "34", 56, "a\"b", 78
tokens: [12],"34",[56],"a\"b",[78]