Once the AST is built, what is the best way implement the tree walker so that functions can be defined and called in whatever order?
For example, this is valid in PHP:
Yes, you are right: this is done in more than one pass over the AST.
You first create a grammar that builds a AST of the source, then you create a tree grammar that is used to iterate over the tree and discovers all defined function. You could then evaluate the script using another tree grammar that takes the discovered functions from the previous tree grammar.
Take the source:
<?php
f(); // function called before it’s defined
function f() {
g();
}
function g() {}
?>
which is parsed into the following AST:
using the (combined) grammar:
grammar PHPMin;
options {
output=AST;
}
tokens {
SCRIPT; F_CALL; F_DECL; F_BODY;
}
parse
: script EOF -> script
;
script
: '<?php' atom* '?>' -> ^(SCRIPT atom*)
;
atom
: functionCall
| functionDecl
;
functionCall
: Identifier '(' ')' ';' -> ^(F_CALL Identifier)
;
functionDecl
: 'function' Identifier '(' ')' '{' functionBody '}' -> ^(F_DECL Identifier functionBody)
;
functionBody
: functionCall* -> ^(F_BODY functionCall*)
;
Identifier : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
LineComment : '//' ~('\r' | '\n')* ('\r'? '\n' | EOF){skip();} ;
Space : (' ' | '\t' | '\r' | '\n'){skip();} ;
Then discover the declared functions using a "tree-walker" generated from the following tree grammar:
tree grammar PHPMinFunctionWalker;
options {
tokenVocab=PHPMin;
ASTLabelType=CommonTree;
}
@members {
java.util.Set<String> declared = new java.util.HashSet<String>();
}
discover
: script
;
script
: ^(SCRIPT atom*)
;
atom
: functionCall
| functionDecl
;
functionCall
: ^(F_CALL Identifier)
;
functionDecl
: ^(F_DECL Identifier functionBody) {declared.add($Identifier.text);}
;
functionBody
: ^(F_BODY functionCall*)
;
To test it all, create a lexer and parser (A), generate the "tree-walker" (B), compile all source files (C):
// A
java -cp antlr-3.2.jar org.antlr.Tool PHPMin.g
// B
java -cp antlr-3.2.jar org.antlr.Tool PHPMinFunctionWalker.g
// C
javac -cp antlr-3.2.jar *.java
// D
java -cp .:antlr-3.2.jar Main // *nix
java -cp .;antlr-3.2.jar Main // Windows
and run the following main class (D):
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "<?php \n" +
"f(); // function called before it’s defined \n" +
"function f() { \n" +
" g(); \n" +
"} \n" +
"function g() {} \n" +
"?> \n";
// create a lexer and parser for the source
ANTLRStringStream in = new ANTLRStringStream(source);
PHPMinLexer lexer = new PHPMinLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PHPMinParser parser = new PHPMinParser(tokens);
PHPMinParser.parse_return returnValue = parser.parse();
CommonTree tree = (CommonTree)returnValue.getTree();
// create a tree walker to discover all declared functions
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
nodes.setTokenStream(tokens);
PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
functions.discover();
System.out.println("Declared functions: "+functions.declared);
}
}
which produces the following output:
Declared functions: [f, g]
Of course, this is just an example of how to approach it, not of how it is best done. I can imagine (when using Java to interpret the script), you wouldn't store the declared functions as simple Strings in a Set<String>
, but rather as a Map<String, CommonTree>
to easily get the root of a function and evaluate it when called.
Further reading: http://www.antlr.org/wiki/display/ANTLR3/Simple+tree-based+interpeter
Good luck!
EDIT
The seconds pass could then check if all functions are defined ahead of it using the previous tree-walker:
tree grammar PHPMinValidateWalker;
options {
tokenVocab=PHPMin;
ASTLabelType=CommonTree;
}
@members {
java.util.Set<String> declared = new java.util.HashSet<String>();
}
validate
: script
;
script
: ^(SCRIPT atom*)
;
atom
: functionCall
| functionDecl
;
functionCall
: ^(F_CALL Identifier)
{
if(!declared.contains($Identifier.text)) {
throw new RuntimeException("no such function: " + $Identifier.text);
}
}
;
functionDecl
: ^(F_DECL Identifier functionBody)
;
functionBody
: ^(F_BODY functionCall*)
;
Using the test:
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "<?php \n" +
"f(); // function called before it’s defined \n" +
"function f() { \n" +
" g(); \n" +
" x(); \n" +
"} \n" +
"function g() {} \n" +
"?> \n";
// create a lexer and parser for the source
ANTLRStringStream in = new ANTLRStringStream(source);
PHPMinLexer lexer = new PHPMinLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
PHPMinParser parser = new PHPMinParser(tokens);
PHPMinParser.parse_return returnValue = parser.parse();
CommonTree tree = (CommonTree)returnValue.getTree();
// create a tree walker to discover all declared functions
CommonTreeNodeStream nodes = new CommonTreeNodeStream(tree);
nodes.setTokenStream(tokens);
PHPMinFunctionWalker functions = new PHPMinFunctionWalker(nodes);
functions.discover();
System.out.println("Declared functions: "+functions.declared);
// PHPMinValidateWalker
nodes = new CommonTreeNodeStream(tree);
nodes.setTokenStream(tokens);
PHPMinValidateWalker validator = new PHPMinValidateWalker(nodes);
validator.declared = functions.declared;
validator.validate();
}
}
produces an exception since x()
is not define anywhere. Removing it from the source will cause the tree-walker to produce no exception.