问题
I am trying to write a translator for a Java like language to multiple languages.
Right now I am facing 2 problems:
First is to decompose complex expression in a sequence of basic operation and then translating them to destination language.
For example my starting language can be:
var a = (ln(b) + avg(c))*2
I'd like to traslate it like:
var x1 = log_N(b);
var x2 = average(c);
var x3 = sum(x1, x2);
var a = multiply(x3, 2);
I think i have to use a Tree parser, nut I am not sure how to integrate it with StringTemplate. Moreover i am adding extra variables like x1, x2, and x3 and i don't know how to handle this situation.
The second problem is that one of my destination language is a plsql like language. In this case a need to ensure that all output variables become cursors and pass them to the functions.
For example the expression:
var a = (ln(b) + avg(c))*2
Should be translated in this way:
log_N(x1, b);
average(x2, c);
sum(x3, x1, x2);
multiply(a, x3, 2);
where x1, x2, x3 and a will become output cursors.
Can anyone help me finding the right solution?
Thanks
回答1:
I think i have to use a Tree parser, but I am not sure how to integrate it with StringTemplate.
Integrating StringTemplate into a tree parser is basically the same as integrating it into a token parser: define the output
option as template
, then write the rule productions accordingly.
Below is a small tree grammar that uses templates for output. Note that the only meaningful difference between this grammar and the one I described in a previous answer is that this one operates on tree nodes rather than tokens. The templates work the same.
tree grammar AstToTemplateParser;
options {
output = template;
tokenVocab = JavaLikeToAst;
ASTLabelType = CommonTree;
}
program
: ^(PROGRAM decls+=decl+) -> write(text={$decls})
;
decl
: ^(DECL ID ^(op args+=arg*)) -> assign(name={$ID.text}, op={$op.st}, args={$args})
| ^(DECL ID ^(CALL method args+=arg*)) -> assign(name={$ID.text}, op={$method.st}, args={$args})
;
arg
: ID -> write(text={$ID.text})
| INT -> write(text={$INT.text})
;
method
: ID -> op(name={$ID.text})
;
op : STAR -> op(name={$STAR.text})
| DIV -> op(name={$DIV.text})
| PLUS -> op(name={$PLUS.text})
| MINUS -> op(name={$MINUS.text})
;
Moreover i am adding extra variables like x1, x2, and x3 and i don't know how to handle this situation.
That's the kicker. The easiest way that I know of (which isn't terribly easy) is to first generate the baseline AST using the token parser then use a tree parser to flatten the AST into a list of declarations, each representing expressions from the original input -- x1
, x2
, and x3
in your question.
The intermediate stages look like this:
Original Input
var a = (ln(b) + avg(c))*2
Baseline AST
Flattened AST
Standard Template Applied
var var0 = log_N(b);
var var1 = average(c);
var var2 = add(var0, var1);
var a = multiply(var2, 2);
PLSQL Template Applied
log_N(var0, b);
average(var1, c);
add(var2, var0, var1);
multiply(a, var2, 2);
Here's a token parser grammar that simply generates a baseline AST that resembles the one in your question. There's nothing really noteworthy here, it just produces a typical AST.
grammar JavaLikeToAst;
options {
output = AST;
}
tokens {
PROGRAM; DECL; CALL;
}
compilationUnit : statement* EOF -> ^(PROGRAM statement*);
statement : decl;
decl : VAR ID EQ expr -> ^(DECL ID expr);
expr : add_expr;
add_expr : mul_expr ((PLUS|MINUS)^ mul_expr)*;
mul_expr : call_expr ((STAR|DIV)^ call_expr)*;
call_expr : ID LPAR arglist? RPAR -> ^(CALL ID arglist?)
| primary_expr;
arglist : expr (COMMA! expr)*;
primary_expr : ID | INT | LPAR! expr RPAR!;
VAR : 'var';
ID : ('a'..'z'|'A'..'Z')('a'..'z'|'A'..'Z'|'_'|'0'..'9')*;
INT : ('0'..'9')+;
COMMA : ',';
SEMI : ';';
LCUR : '{';
RCUR : '}';
LPAR : '(';
RPAR : ')';
EQ : '=';
PLUS : '+';
MINUS : '-';
STAR : '*';
DIV : '/';
WS : (' '|'\t'|'\f'|'\r'|'\n'){skip();};
Here is where it gets ugly (at least the way I choose to do it ;)). The tree grammar below transforms a baseline AST produced from above into a list of declarations that correspond to the expressions of the AST -- the flattened AST. Below this, I'll step through the fun bits of the grammar.
tree grammar AstToFlatAstParser;
options {
output = AST;
tokenVocab = JavaLikeToAst;
ASTLabelType = CommonTree;
filter = true;
}
@header {
import java.util.HashMap;
}
@members {
private int currentId = 0;
private HashMap<Integer, Object> exprs = new HashMap<Integer, Object>();
private boolean newDecls = false;
private int nextId() {
return currentId++;
}
private Object generateId(int id) {
return adaptor.create(ID, "var" + id);
}
private void saveExpr(int id, Object expr){
newDecls = true;
exprs.put(id, expr);
}
private Object buildNewDecls() {
Object newDecls = adaptor.nil();
for (int i = 0; i < currentId; ++i){
if (!exprs.containsKey(i)){
continue; //This id was generated but not used.
}
Object expr = exprs.get(i);
Object decl = adaptor.create(DECL, tokenNames[DECL]);
adaptor.addChild(decl, adaptor.create(ID, "var" + i));
adaptor.addChild(decl, expr);
adaptor.addChild(newDecls, decl);
}
exprs.clear();
return newDecls;
}
}
bottomup
: exit_program
| exit_op
;
exit_op
@init {
int myId = nextId();
}
: ^(binary_op reduced reduced)
{$start.parent != null && $start.parent.getType() != DECL}?
{saveExpr(myId, $start);}
-> {generateId(myId)}
| ^(CALL ID .*)
{$start.parent != null && $start.parent.getType() != DECL}?
{saveExpr(myId, $start);}
-> {generateId(myId)}
;
binary_op : STAR | DIV | PLUS | MINUS;
reduced : ID | INT;
exit_program
//Only rebuild PROGRAM if a new declaration is going to be built, that is, when "newDecls" is true.
//Otherwise PROGRAM is considered changed when it isn't and the processing never ends.
: {newDecls}? ^(PROGRAM old+=.*) {newDecls = false;}
-> ^(PROGRAM {buildNewDecls()} $old*)
;
First, notice that the grammar is mostly Java code. There are only five parser rules, and most of them are simple. This is a filter
tree grammar, so rules bottomup
and topdown
are the entry points. Only bottomup
is necessary in this case, so topdown
isn't specified. Rule bottomup
is repeated until the output tree is unchanged, which to us means when there are no more declarations to produce and the tree is fully flattened.
Second, notice that rule exit_program
is where the new declarations are being written out to the AST. I'm using a semantic predicate ({newDecls}?
) to ensure that PROGRAM
is only modified when new declarations are added. Remember how I said bottomup
is called until no more changes are made? Without this semantic predicate, exit_program
will always modify PROGRAM
and the tree parsing will never stop processing bottomup
. This is a crude work-around for this special case but it works. The new declarations are inserted at the beginning of PROGRAM
to ensure that they appear before they are referenced. It's no good defining x1
ten lines after it's expected.
Third, notice that rule exit_op
replaces expressions (like ln(b)
) with declarations (like var0
). An expression is replaced if one of the following is true:
The expression is a binary operation whose operands are both "reduced" (that is, they're both integers or variable ids) and is not the child of a
DECL
node.var a = 1 + 2
is not changed because1 + 2
is the child of a declaration.var b = a + (2 + c)
is changed because(2 + c)
has two "reduced" operands and is not the child of aDECL
node (it's the child of the+
ina + ...
).The expression is a
CALL
that is not the child of aDECL
node.var a = ln(b)
is untouched, butvar a = ln(b) + 3
is changed becauseln(b)
is a child of+
.
An expression is stored in exprs
before it's replaced by an id. It's reconstituted in the exit_program
rule when the rule calls buildNewDecls
. buildNewDecls
just uses the parser's built-in TreeAdaptor
member (named adaptor
) to generate the DECL
nodes that appear in the flattened AST. The Javadoc for the adaptor methods does an adequate job explaining what the calls do, so I won't go into the details.
Caveat: The parsers produced by the grammars above work fine for the very limited case that you've presented. I don't know what errors they will produce when applied to any broader scenario.
The second problem is that one of my destination language is a plsql like language. In this case a need to ensure that all output variables become cursors and pass them to the functions...
That would be something that the templates could manage for you once your AST is just a flat list of declarations, as shown above.
You'll pass the flattened AST to a template-based tree parser, like the one up top, to produce the different text versions like those you've listed. In this case, a template would accept all the parts of a declaration – the variable name, the operation/method name, and the operands/arguments – and produce text like either variable = method(arg0, arg1)
or method(variable, arg0, arg1)
, depending on the template used. The key is to ensure that the input is flat and that the template receives everything related to the declaration.
Here is a test application that ties it all together.
JavaLikeToAstTest.java
public class JavaLikeToAstTest {
public static void main(String[] args) throws Exception {
final String code = "var a = (ln(b) + avg(c))*2";
CharStream input = new ANTLRStringStream(code);
JavaLikeToAstLexer lexer = new JavaLikeToAstLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaLikeToAstParser parser = new JavaLikeToAstParser(tokens);
JavaLikeToAstParser.compilationUnit_return result = parser
.compilationUnit();
if (lexer.getNumberOfSyntaxErrors() > 0 || parser.getNumberOfSyntaxErrors() > 0) {
throw new Exception("Syntax Errors encountered!");
}
CommonTree tree = (CommonTree) result.tree;
System.out.printf("Baseline AST: %s%n%n", tree.toStringTree());
tree = flatten(tree);
System.out.printf("Flattened AST: %s%n%n", tree.toStringTree());
translate(tree, "AstToPlsql.stg");
translate(tree, "AstToGlobal.stg");
}
private static CommonTree flatten(CommonTree tree) {
AstToFlatAstParser parser = new AstToFlatAstParser(
new CommonTreeNodeStream(tree));
return (CommonTree) parser.downup(tree, true);
}
private static void translate(CommonTree tree, String templateResourceName)
throws Exception {
AstToTemplateParser parser = new AstToTemplateParser(
new CommonTreeNodeStream(tree));
InputStream stream = JavaLikeToTemplateTest.class
.getResourceAsStream(templateResourceName);
Reader reader = new InputStreamReader(stream);
parser.setTemplateLib(new StringTemplateGroup(reader));
reader.close();
stream.close();
System.out.printf("Result for %s%n%n%s%n%n", templateResourceName,
parser.program().st.toString());
}
Here are two simple StringTemplate group files to handle the translation process.
AstToGlobal.stg
group AstToGlobal;
methods ::= ["*":"multiply", "/":"divide", "+":"add", "-":"subtract", "avg":"average", "ln":"log_N", default:key]
assign(name, op, args) ::= <<var <name> = <op>(<args;separator=", ">) >>
op(name) ::= "<methods.(name)>"
write(text) ::= << <text;separator="\n"> >>
AstToPlsql.stg
group AstToPlsql;
methods ::= ["*":"multiply", "/":"divide", "+":"add", "-":"subtract", "avg":"average", "ln":"log_N", default:key]
assign(name, op, args) ::=<< <op>(<name>, <args;separator=", ">) >>
op(name) ::= "<methods.(name)>"
write(text) ::= << <text;separator="\n"> >>
The application produces the following output:
Baseline AST: (PROGRAM (DECL a (* (+ (CALL ln b) (CALL avg c)) 2)))
(CALL ln b) -> var0
(CALL avg c) -> var1
(+ var0 var1) -> var2
(PROGRAM (DECL a (* var2 2))) -> (PROGRAM (DECL var0 (CALL ln b)) (DECL var1 (CALL avg c)) (DECL var2 (+ var0 var1)) (DECL a (* var2 2)))
Flattened AST: (PROGRAM (DECL var0 (CALL ln b)) (DECL var1 (CALL avg c)) (DECL var2 (+ var0 var1)) (DECL a (* var2 2)))
Result for AstToPlsql.stg
log_N(var0, b )
average(var1, c )
add(var2, var0 , var1 )
multiply(a, var2 , 2 )
Result for AstToGlobal.stg
var var0 = log_N(b )
var var1 = average(c )
var var2 = add(var0 , var1 )
var a = multiply(var2 , 2 )
There is no code in AstToTemplate.g
or the templates to handle simple assignments like var a = 3
, but I think it will be easy enough to add the code to handle it using the op/method assignment as a guide.
来源:https://stackoverflow.com/questions/13817046/decompose-expression-in-base-operation-antlr-stringtemplate