问题
I am writing transpiler (myLang -> JS) using ANTLR (javascript target with visitor).
Focus is on target code generation part, from the parse tree.
As in, how to deal with language source codes variations.
To make question clearer, consider two variations below -
source#1:PRINT 'hello there'
source#2:
varGreeting = 'hey!'
PRINT varGreeting
In case 1, I deal with string. While in case 2, it's a variable. JS target code then needs to be different (below). case 1 with quotes, case 2 without.
target#1 (JS):
console.log("hello there"); // <-- string
target#2 (JS):
var varGreeting = "hey!";
console.log(varGreeting); // <-- var
How can I best disambiguate and generate different code?
At once, I thought of using rule name (ID
, STRLIT
) as bearer of different usages.
But I couldn't find these being exposed in RuleContext API. I looked at java ones, assuming same in JS runtime.
getText()
gives value ('hello there'
, varGreeting
), no meta/attribute info that I can leverage.
I digged into the tree/ctx object and didn't find them in easily consumable way.
Question: how to best go about this, without building ugly hacks? Transpiler seems to be in within use case spot of ANTLR, do I missing something?
(relevant part of) Grammar:
print : PRINTKW (ID | STRLIT) NEWLINE;
STRLIT: '\'' .*? '\'' ;
ID : [a-zA-Z0-9_]+;
Visitor override:
// sample code for generating code for case 1 (with quotes)
myVisitor.prototype.visitPrint = function(ctx) {
const Js =
`console.log("${ctx.getChild(1).getText()}");`;
// ^^ this is the part which needs different treatment for case 1 and 2
// write to file
fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
if (err) return console.log(err);
console.log(`done`);
});
return this.visitChildren(ctx);
};
using ANTLR 4.8
回答1:
You're using getChild(1)
to access the argument of the print statement. This will give you a TerminalNode
containing either an ID
or STRLIT
token. You can access the token using the getSymbol()
method and you can then access the token's type using the .type
property. The type will be a number that you can compare against constants like MyLanguageParser.ID
or MyLanaguageParser.STRLIT
.
Using getChild
isn't necessarily the best way to access a node's children though. Each context class will have specific accessors for each of its children.
Specifically the PrintContext
object will have methods ID()
and STRLIT()
. One of them will return null
, the other will return a TerminalNode
object containing the given token. So you know whether it was an ID or string literal by seeing which one isn't null.
That said, the more common solution would be to not have a union of possible kinds of arguments in the print
rule, but instead allow any kind of expression as an argument to print
. You can then use labelled alternatives in your expression
rule to get different visitor methods for each kind of expression:
print : PRINTKW expression NEWLINE;
expression
: STRLIT #StringLiteral
| ID #Variable
;
Then your visitor could look like this:
myVisitor.prototype.visitPrint = function(ctx) {
const arg = this.visit(ctx.expression());
const Js = `console.log(${arg});`;
// write to file
fs.writeFile(targetFs + fileName + '.js', Js, 'utf8', function (err) {
if (err) return console.log(err);
console.log(`done`);
});
};
myVisitor.prototype.visitStringLiteral = function(ctx) {
const text = ctx.getText();
return `"${text.substring(1, text.length - 1)}"`;
}
myVisitor.prototype.visitVariable = function(ctx) {
return ctx.getText();
}
Alternatively you could leave out the labels and instead define a visitExpression
method that handles both cases by seeing which getter returns null:
myVisitor.prototype.visitExpression = function(ctx) {
if (ctx.STRLIT !== null) {
const text = ctx.getText();
return `"${text.substring(1, text.length - 1)}"`;
} else {
return ctx.getText();
}
}
PS: Do note that single quotes work just fine in JavaScript, so you don't actually need to strip the single quotes and replace them with double quotes. You could just use .getText()
without any post-processing in both cases and that'd still come out as valid JavaScript.
来源:https://stackoverflow.com/questions/60777041/variation-in-code-generation-using-antlr4-parse-tree-visitors