I am attempting to design a basic compiler to pseudo-assembly code. However, I cannot figure out how to implement closures. It seems I would need to associate specific register
Stacks cannot be sufficient... consider a simpler case where they do
function bar(f) {
alert(f());
}
function foo(x) {
bar(function(){ return x; });
}
foo(42);
In the above case it would be theoretically possible for the x
in the closure to live in the stack frame of foo
because the closure is not going to outlive its creator foo
.
However with a small change:
function bar(f) {
to_call_later.push(f);
}
the closure will be stored away and will be potentially called when foo
already terminated and the stack space for its activation record has been reclaimed. Clearly x
cannot be in that stack area because it must survive.
Therefore there are two problems:
a closure must have some storage (environment). This is obvious when you think that calling foo
twice passing two different values should create two independent storages for x
. If the closure was just the code then this is not possible unless different code was going to be generated each time you call foo
.
this storage must live at least as long as the closure itself, not only as who creates the closure.
Note also that if you want to have read/write closed-over variables you need an extra level of indirection, for example:
function bar(f) {
alert(f());
}
function foo(x) {
var c1 = function() { return ++x; };
var c2 = function() { return x *= 2; };
bar(c1);
bar(c2);
}
foo(42); // displays 42+1=43 and 43*2=86 (not 42*2=84!)
in other words you can have several different closures sharing the same environment.
So x
cannot be in the stack of foo
activation record and it cannot be in the closure object itself. The closure object must have a pointer to where x
is living.
A possible solution to implement this on say x86 is:
Use a garbage collected or reference-counted memory management system. Stacks are by far insufficient to handle closures.
Each closure is an object with two fields: a pointer to code and an array of pointers to closed-over variables (the "environment").
When executing code you have a stack esp
and e.g. esi
is pointing to the closure object itself (so (esi)
is the address of the code, (esi+4)
is the address of first closed-over variable, (esi+8)
is the address of second closed-over variable and so on).
Each variable is an independent heap-allocated object that can survive as long as there are still closures pointing to it.
This is of course a very crude approach. For example SBCL is much smarter and variables that are not captured are allocated on stack and/or registers only. This requires doing an analysis of how a closure is used.
Supposing you're only considering a purely functional setting (in other words the return value of a function/closure depends only on the passed parameter and the closure state cannot mutate) then things can be simplified a little.
What you can do is making the closure object containing the captured values instead of the captured variables and by making at the same time the closure itself a copyable object then just a stack can in theory be used (except that there is the problem that a closure can vary in size depending on how much state needs to capture), so it's not easy at least for me to imagine a reasonable stack-only based protocol for parameter passing and value returning in this case.
Removing the variable size problem by making the closure a fixed-size object you can see how this C program can implement closures using only stack (note that there are no malloc
calls)
#include <stdio.h>
typedef struct TClosure {
int (*code)(struct TClosure *env, int);
int state;
} Closure;
int call(Closure *c, int x) {
return c->code(c, x);
}
int adder_code(Closure *env, int x) {
return env->state + x;
}
int multiplier_code(Closure *env, int x) {
return env->state * x;
}
Closure make_closure(int op, int k) {
Closure c;
c.state = k;
c.code = (op == '+' ? adder_code : multiplier_code);
return c;
}
int main(int argc, const char *argv[]) {
Closure c1 = make_closure('+', 10);
Closure c2 = make_closure('*', 3);
printf("c1(3) = %i, c2(3) = %i\n",
call(&c1, 3), call(&c2, 3));
return 0;
}
Closure
structs can be passed, returned and stored on stack because the environment is read-only so you don't have the lifetime problem because immutable data can be copied without affecting semantic.
A C compiler could use such an approach to create closures that can only capture variables by value, and indeed is what C++11 lambda provide (you can capture also by reference, but it's up to the programmer to ensure that the lifetime of captured variables lasts enough).