I\'m trying to give a broad picture of my problem. I need to write a program with Node.js that should be able to detect all dependencies a function.
E.g.
(I apologise in advance: I usually try and make my answers humorous to ease the reader through them, but I couldn't successfully do so in this case. Consider that a double apology for the length of this answer.)
This is not an easy problem. Instead of solving it in full, we will limit its scope - we will only solve the portion of the problem we care about. We will do so by parsing the input with a JavaScript parser and going over it with a simple recurive-descent algorithm. Our algorithm will analyse the program's scope and correctly identify function calls.
All the rest is just filling in the blanks! The result is at the bottom of the answer, so I recommend you grab to the first comment if you don't want to read through.
As Benjamin Gruenbaum's answer says, this is a very, very hard problem because of JavaScript's dynamic nature. However, what if instead of making a solution which'll work for 100% of programs, we instead do it for a subset of programs, if we limit ourselves to handle certain things?
The most important limitation:
eval
. If we include eval
, it's a spiral into chaos. This is because eval let's you use arbitrary strings which makes tracking dependencies impossible without checking every possible input. In NodeJS there are no document.write
s and setTimeout
only accepts a function so we don't have to worry about those. However, we also disallow the vm module.The following limitations are to ease the process. They may be solvable, but solving them is out of scope for this answer:
obj[key]()
it saddens me to introduce this limitation, but it's definitely solvable for some cases (e.g. key = 'foo'
but not key = userInput()
)var self = this
. Definitely solvable with a complete scope parser.(a, b)()
And finally, limitations to the implementation in this answer - either because of complexity constraints or time constraints (but they are very solvable):
foo.bar()
or this.foo()
would've at least doubled the program complexity. Put in enough time, and it's very doable.with
statement, catch
blocks). We don't deal with them.In this answer, I'll outline (and provide) a proof-of-concept parser.
Given a program, how can we decipher its function dependencies?
//A. just a global function
globalFunction();
//B. a function within a function
var outer = function () {
function foo () {}
foo();
};
//C. calling a function within itself
var outer = function inner () {
inner();
};
//D. disambiguating between two identically named functions
function foo () {
var foo = function () {};
foo();
}
foo();
In order to understand a program, we need to break its code apart, we need to understand its semantics: we need a parser. I've chosen acorn because I've never used it and heard good praise. I suggest you play with it a bit, see what programs look like in SpiderMonkeys's AST.
Now that we have a magical parser which transforms JavaScript into an AST (an Abstract Syntax Tree), how will we logically handle finding dependencies? We'll need do two things:
We can see why example D above can be ambiguous: There are two functions called foo
, how can we know which one foo()
means? That's why we need to implement scoping.
Since the solution is in two parts, let's solve it that way. Beginning from the biggest problem:
So...we have an AST. It has a bunch of nodes. How do we build a scope? Well, we only care about function scope. That eases the process, as we know we only have to deal with functions. But before we talk about how to use scopes, let's define the function which makes scopes.
What does a scope have? It's not a complex being: It has a parent scope (or null
if it's the global scope), and it has the items it contains. We need a way to add things to a scope, and get things from one. Let's do that:
var Scope = function (parent) {
var ret = { items : {}, parent : parent, children : [] };
ret.get = function (name) {
if (this.items[name]) {
return this.items[name];
}
if (this.parent) {
return this.parent.get(name);
}
//this is fake, as it also assumes every global reference is legit
return name;
};
ret.add = function (name, val) {
this.items[name] = val;
};
if (parent) {
parent.children.push(ret);
}
return ret;
};
As you may have noticed, I'm cheating in two aspects: First, I'm assigning child scopes. That is to make it easier for us measly humans to see that things are working (otherwise, all scoping would be internal, we'd only see the global scope). Second, I'm assuming the global scope contains all - that is, if foo
isn't defined in any scope, then it must be an existing global variable. That may or may not be desirable.
OK, we have a way to represent scopes. Don't crack open the champagne yet, we still have to actually make them! Let's see how a simple function declaration, function f(){}
looks like in AST:
{
"type": "Program",
"start": 0,
"end": 14,
"body": [{
"type": "FunctionDeclaration",
"start": 0,
"end": 14,
"id": {
"type": "Identifier",
"start": 9,
"end": 10,
"name": "f"
},
"params": [],
"body": {
"type": "BlockStatement",
"start": 12,
"end": 14,
"body": []
}
}]
}
That's quite a mouthful, but we can brave through it! The juicy part is this:
{
"type": "FunctionDeclaration",
"id": {
"type": "Identifier",
"name": "f"
},
"params": [ ... ],
"body": { ... }
}
We have a FunctionDeclaration
node with an id
property. That id
's name is our function's name! Let's assume we have a function walk
which takes care of walking over nodes, and currentScope
and currentFuncName
variables, and we've just arrived at parsing our function declaration node
. How do we do it? Code speaks louder than words:
//save our state, so we will return to it after we handled the function
var cachedScope = currentScope,
cachedName = currentFuncName;
//and now we change the state
currentScope = Scope(cachedScope);
currentFuncName = node.id.name;
//create the bindings in the parent and current scopes
//the following lines have a serious bug, we'll get to it later (remember that
// we have to meet Captain Crunchypants)
cachedScope.add(currentFuncName, currentName);
currentScope.add(currentFuncName, currentName);
//continue with the parsing
walk(node.body);
//and restore the state
currentScope = cachedScope;
currentFuncName = cachedName;
But wait, what about function expressions? They behave a bit differently! First and foremost, they don't necessarily have a name, and if they do, it's only visible inside them:
var outer = function inner () {
//outer doesn't exist, inner is visible
};
//outer is visible, inner doesn't exist
Let's make another huge assumption that we've dealt with the variable declaration part - we created the proper binding at the parent scope. Then, the logic above for handling functions changes only slightly:
...
//and now we change the state
currentScope = Scope(cachedScope);
//we signify anonymous functions with <anon>, since a function can never be called that
currentFuncName = node.id ? node.id.name : '<anon>';
...
if (node.id) {
currentScope.add(currentFuncName, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
cachedScope.add(currentFuncName, currentFuncName);
}
...
And believe it or not, that's more or less the entire scope handling mechanism in the final solution. I expect as you add things like objects it'll get more a bit more complicated, but it not by much.
It's time to meet Captain Crunchpants. The very observant listener will by now have remembered example D. Let's freshen our memory:
function foo () {
function foo () {}
foo();
}
foo();
In parsing that, we need a way to tell the outer foo
and the inner foo
apart - otherwise, we won't be able to know which of these foo
calls, and our dependency finder will be toast. Furthermore, we won't be able to tell them apart in the dependency management - if we just add to the results by function name, we'll get overwriting. In other words, we need an absolute function name.
I chose to represent nesting with separation with a #
character. The above, then, has a function foo
, with an inner function foo#foo
, with a call to foo#foo
and a call to foo
. Or, for a less confusing example:
var outer = function () {
function inner () {}
inner();
};
outer();
Has a function outer
and a function outer#inner
. There's a call to outer#inner
and a call to outer
.
So, let's create this function which takes the previous name, and the current function's name, and mushes them together:
function nameToAbsolute (parent, child) {
//foo + bar => foo#bar
if (parent) {
return parent + '#' + name;
}
return name;
}
And modify our function handling pseudo-code (which is about to come to life! I promise!):
...
currentScope = Scope(cachedScope);
var name = node.id ? node.id.name : '<anon>';
currentFuncName = nameToAbsolute(cachedName, name);
...
if (node.id) {
currentScope.add(name, currentFuncName);
}
if (node.type === 'FunctionDeclaration') {
cachedScope.add(name, currentFuncName);
}
Now we're talking! It's time to move on to actually doing something! Maybe I've lied to you all along and I know nothing, maybe I failed miserably and I continued writing until now because I knew nobody will read this far and I'll get many upvotes because it's a long answer!?
HAH! Keep dreming! There's much more to come! I didn't sit on this for a few days for no reason! (As an interesting social experiment, could anyone upvoting comment, saying something around the lines "Captain Crunchpants was happy to see you"?)
On a more serious note, we should begin making the parser: What holds our state and walks over nodes. Since we'll have two parsers at the end, scope and dependency, we'll make a "master parser" which calls each one when needed:
var parser = {
results : {},
state : {},
parse : function (string) {
this.freshen();
var root = acorn.parse(string);
this.walk(root);
return this.results;
},
freshen : function () {
this.results = {};
this.results.deps = {};
this.state = {};
this.state.scope = this.results.scope = Scope(null);
this.state.name = '';
},
walk : function (node) {
//insert logic here
},
// '' => 'foo'
// 'bar' => 'bar#foo'
nameToAbsolute : function (parent, name) {
return parent ? parent + '#' + name : name;
},
cacheState : function () {
var subject = this.state;
return Object.keys( subject ).reduce(reduce, {});
function reduce (ret, key) {
ret[key] = subject[key];
return ret;
}
},
restoreState : function (st) {
var subject = this.state;
Object.keys(st).forEach(function (key) {
subject[key] = st[key];
});
}
};
That's a bit of cruft, but hopefully it's understandable. We made state
into an object, and to make it flexible, cacheState
and restoreState
are simply cloning/merging.
Now, for our beloved scopeParser
:
var scopeParser = {
parseFunction : function (func) {
var startState = parser.cacheState(),
state = parser.state,
name = node.id ? node.id.name : '<anon>';
state.scope = Scope(startState.scope);
state.name = parser.nameToAbsolute(startState.name, name);
if (func.id) {
state.scope.add(name, state.name);
}
if (func.type === 'FunctionDeclaration') {
startState.scope.add(name, state.name);
}
this.addParamsToScope(func);
parser.walk(func.body);
parser.restoreState(startState);
}
};
The casually observant reader will notice that parser.walk
is empty. Time to fill 'er up!
walk : function (node) {
var type = node.type;
//yes, this is tight coupling. I will not apologise.
if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
scopeParser.parseFunction(node)
}
else if (node.type === 'ExpressionStatement') {
this.walk(node.expression);
}
//Program, BlockStatement, ...
else if (node.body && node.body.length) {
node.body.forEach(this.walk, this);
}
else {
console.log(node, 'pass through');
}
//...I'm sorry
}
Again, mostly technicalities - to understand these, you need to play with acorn. We want to make sure we iterate and walk into nodes correctly. Expressions Nodes like (function foo() {})
has an expression
property we walk over, BlockStatement
Nodes (e.g. the actual body of a function) and Program Nodes have a body
array, etc.
Since we have something resembling logic, let's try:
> parser.parse('function foo() {}').scope
{ items: { foo: 'foo' },
parent: null,
children:
[ { items: [Object],
parent: [Circular],
children: [],
get: [Function],
add: [Function] } ],
get: [Function],
add: [Function] }
Neat! Play around with function declarations and expressions, see that they're nested correctly. We did however forget to include variable declaration:
var foo = function () {};
bar = function () {};
A good (and fun!) exercise is adding them yourself. But don't worry - they'll be included in the final parser;
Who'd believe!? We're done with scopes! D-O-N-E! Let's do a cheer!
Oh oh oh...where did you think you're going!? We only solved part of the problem - we still have to find the dependencies! Or did you forget all about it!? Fine, you can go to the toilet. But it better be #1.
Wow, did you even remember we had section numbers? On an unrelated note, when I typed the last sentence, my keyboard made a sound reminiscent of the first note of the Super Mario Theme Song. Which is now stuck in my head.
Ok! So, we have our scopes, we have our function names, it's time to identify function calls! This will not take long. Doing acorn.parse('foo()')
gives:
{
"type": "Program",
"body": [{
"type": "ExpressionStatement",
"expression": {
"type": "CallExpression",
"callee": {
"type": "Identifier",
"name": "f"
},
"arguments": []
}
}]
}
So we're looking for a CallExpression
. But before we go all walk
over it, let's first review our logic. Given this node, what do we do? How do we add a dependency?
This is not a difficult problem, as we already took care of all the scoping. We add to the dependencies of the containing function (parser.state.name
) the scope resolution of callExpression.callee.name
. Sounds simple!
var deps = parser.results.deps,
scope = parser.state.scope,
context = parser.state.name || '<global>';
if (!deps[context]) {
deps[context] = [];
}
deps[context].push(scope.get(node.callee.name));
There're, once again, a trick with handling the global context. If the current state is nameless, we assume it's the global context and give it a special name <global>
.
Now that we have that, let's build our dependencyParser
:
var dependencyParser = {
parseCall : function (node) {
...the code above...
}
};
Truly beautiful. We still need to modify parser.walk
to include CallExpression
s:
walk : function (node) {
...
else if (type === 'CallExpression') {
dependencyParser.parseCall(node);
}
}
And try it out on example D:
> parser.parse('function foo() { var foo = function () {}; foo(); } foo()').deps
{ foo: [ 'foo#foo' ], '<global>': [ 'foo' ] }
HAHA! IN YOUR FACE, PROBLEM! WOOOOOOOOOOO!
You may commence celebrations. Remove your pants, run around in the city, claim you're the town chicken and burn stray garbage cans (Zirak and Affiliates in no way support arson of any kind or indecent exposure. Any action taken by oh, say, any reader is not to be blamed upon Zirak and/or Affiliates).
But seriously now. We solved a very, very limited subset of the problem, and to solve it for a small percentage of real-case scenarios there are a lot of things which have to be done. This is not a discouragement - quite the opposite! I urge you to try and do this. It's fun! (Zirak and Affiliates are in no way responsible for any mental breakdown as a result from trying to to what was just said)
Presented here is the source code of the parser, sans any NodeJS specific stuff (i.e. requiring acorn or exposing the parser):
var parser = {
results : {},
state : {},
verbose : false,
parse : function (string) {
this.freshen();
var root = acorn.parse(string);
this.walk(root);
return this.results;
},
freshen : function () {
this.results = {};
this.results.deps = {};
this.state = {};
this.state.scope = this.results.scope = Scope(null);
this.state.name = '';
},
walk : function (node) {
var type = node.type;
//yes, this is tight coupling. I will not apologise.
if (type === 'FunctionDeclaration' || type === 'FunctionExpression') {
scopeParser.parseFunction(node)
}
else if (type === 'AssignmentExpression') {
scopeParser.parseBareAssignmentExpression(node);
}
else if (type === 'VariableDeclaration') {
scopeParser.parseVarDeclaration(node);
}
else if (type === 'CallExpression') {
dependencyParser.parseCall(node);
}
else if (node.type === 'ExpressionStatement') {
this.walk(node.expression);
}
//Program, BlockStatement, ...
else if (node.body && node.body.length) {
node.body.forEach(this.walk, this);
}
else if (this.verbose) {
console.log(node, 'pass through');
}
//...I'm sorry
},
// '' => 'foo'
// 'bar' => 'bar#foo'
nameToAbsolute : function (parent, name) {
return parent ? parent + '#' + name : name;
},
cacheState : function () {
var subject = this.state;
return Object.keys( subject ).reduce(reduce, {});
function reduce (ret, key) {
ret[key] = subject[key];
return ret;
}
},
restoreState : function (st) {
var subject = this.state;
Object.keys(st).forEach(function (key) {
subject[key] = st[key];
});
}
};
var dependencyParser = {
//foo()
//yes. that's all.
parseCall : function (node) {
if (parser.verbose) {
console.log(node, 'parseCall');
}
var deps = parser.results.deps,
scope = parser.state.scope,
context = parser.state.name || '<global>';
if (!deps[context]) {
deps[context] = [];
}
deps[context].push(scope.get(node.callee.name));
}
};
var scopeParser = {
// We only care about these kinds of tokens:
// (1) Function declarations
// function foo () {}
// (2) Function expressions assigned to variables
// var foo = function () {};
// bar = function () {};
//
// Do note the following property:
// var foo = function bar () {
// `bar` is visible, `foo` is not
// };
// `bar` is not visible, `foo` is
/*
function foo () {}
=>
{
"type": 'FunctionDeclaration',
"id": {
"type": Identifier,
"name": 'foo'
},
"params": [],
"body": { ... }
}
(function () {})
=>
{
"type": "FunctionExpression",
"id": null,
"params": [],
"body": { ... }
}
*/
parseFunction : function (func) {
if (parser.verbose) {
console.log(func, 'parseFunction');
}
var startState = parser.cacheState(),
state = parser.state,
name = this.grabFuncName(func);
state.scope = Scope(startState.scope);
state.name = parser.nameToAbsolute(startState.name, name);
if (func.id) {
state.scope.add(name, state.name);
}
if (func.type === 'FunctionDeclaration') {
startState.scope.add(name, state.name);
}
this.addParamsToScope(func);
parser.walk(func.body);
parser.restoreState(startState);
},
grabFuncName : function (func) {
if (func.id) {
return func.id.name;
}
else if (func.type === 'FunctionExpression') {
return '<anon>';
}
else {
//...this shouldn't happen
throw new Error(
'scope.parseFunction encountered an anomalous function: ' +
'nameless and is not an expression');
}
},
/*
[{
"type": "Identifier",
"name": "a"
}, {
"type": "Identifier",
"name": "b"
}, {
"type": "Identifier",
"name": "c"
}]
*/
addParamsToScope : function (func) {
var scope = parser.state.scope,
fullName = parser.state.name;
func.params.forEach(addParam);
function addParam (param) {
var name = param.name;
scope.add(name, parser.nameToAbsolute(fullName, name));
}
},
parseVarDeclaration : function (tok) {
if (parser.verbose) {
console.log(tok, 'parseVarDeclaration');
}
tok.declarations.forEach(parseDecl, this);
function parseDecl (decl) {
this.parseAssignment(decl.id, decl.init);
}
},
// Lacking a better name, this:
// foo = function () {}
// without a `var`, I call a "bare assignment"
parseBareAssignmentExpression : function (exp) {
if (parser.verbose) {
console.log(exp, 'parseBareAssignmentExpression');
}
this.parseAssignment(exp.left, exp.right);
},
parseAssignment : function (id, value) {
if (parser.verbose) {
console.log(id, value, 'parseAssignment');
}
if (!value || value.type !== 'FunctionExpression') {
return;
}
var name = id.name,
val = parser.nameToAbsolute(parser.state.name, name);
parser.state.scope.add(name, val);
this.parseFunction(value);
}
};
var Scope = function (parent) {
var ret = { items : {}, parent : parent, children : [] };
ret.get = function (name) {
if (this.items[name]) {
return this.items[name];
}
if (this.parent) {
return this.parent.get(name);
}
//this is fake, as it also assumes every global reference is legit
return name;
};
ret.add = function (name, val) {
this.items[name] = val;
};
if (parent) {
parent.children.push(ret);
}
return ret;
};
Now if you'll excuse me, I need a long shower.
a.toString()
possiblefuncname(
and possiblefuncname.call(
and possiblefuncname.apply(
You can use statistical profiler log (node --prof yourprogram
, v8.log) to calculate 'statistical' call graph. Take a look at log processor source code here and here
No.
Sorry, this is impossible on a pretty theoretical level in a dynamic language with eval. Good IDEs detect basic stuff, but there are some things you simply can't detect very well:
Let's take your simple case:
function a() {
//do something
b();
};
Let's complicate it a bit:
function a() {
//do something
eval("b();")
};
Now we have to detect stuff in strings, let's go one step ahead:
function a() {
//do something
eval("b"+"();");
};
Now we have to detect the result of string concats. Let's do a couple more of those:
function a() {
//do something
var d = ["b"];
eval(d.join("")+"();");
};
Still not happy? Let's encode it:
function a() {
//do something
var d = "YigpOw==";
eval(atob(d));
};
Now, these are some very basic cases, I can complicate them as much as I want. There really is no way around running the code - you'd have to run it on every possible input and check and we all know that that's impractical.
Pass dependencies as parameters to the function and use inversion of control. Always be explicit about your more complicated dependencies and not implicit. That way you won't need tools to know what your dependencies are :)