Implementing Closures in a Compiler

前端 未结 1 2146
孤街浪徒
孤街浪徒 2021-02-20 12:37

I am attempting to design a basic compiler to pseudo-assembly code. However, I cannot figure out how to implement closures. It seems I would need to associate specific register

1条回答
  •  臣服心动
    2021-02-20 13:24

    Stacks cannot be sufficient... consider a simpler case where they do

    function bar(f) {
        alert(f());
    }
    
    function foo(x) {
        bar(function(){ return x; });
    }
    
    foo(42);
    

    In the above case it would be theoretically possible for the x in the closure to live in the stack frame of foo because the closure is not going to outlive its creator foo. However with a small change:

    function bar(f) {
        to_call_later.push(f);
    }
    

    the closure will be stored away and will be potentially called when foo already terminated and the stack space for its activation record has been reclaimed. Clearly x cannot be in that stack area because it must survive.

    Therefore there are two problems:

    1. a closure must have some storage (environment). This is obvious when you think that calling foo twice passing two different values should create two independent storages for x. If the closure was just the code then this is not possible unless different code was going to be generated each time you call foo.

    2. this storage must live at least as long as the closure itself, not only as who creates the closure.

    Note also that if you want to have read/write closed-over variables you need an extra level of indirection, for example:

    function bar(f) {
        alert(f());
    }
    
    function foo(x) {
        var c1 = function() { return ++x; };
        var c2 = function() { return x *= 2; };
        bar(c1);
        bar(c2);
    }
    
    foo(42);  // displays 42+1=43 and 43*2=86 (not 42*2=84!)
    

    in other words you can have several different closures sharing the same environment.

    So x cannot be in the stack of foo activation record and it cannot be in the closure object itself. The closure object must have a pointer to where x is living.

    A possible solution to implement this on say x86 is:

    • Use a garbage collected or reference-counted memory management system. Stacks are by far insufficient to handle closures.

    • Each closure is an object with two fields: a pointer to code and an array of pointers to closed-over variables (the "environment").

    • When executing code you have a stack esp and e.g. esi is pointing to the closure object itself (so (esi) is the address of the code, (esi+4) is the address of first closed-over variable, (esi+8) is the address of second closed-over variable and so on).

    • Each variable is an independent heap-allocated object that can survive as long as there are still closures pointing to it.

    This is of course a very crude approach. For example SBCL is much smarter and variables that are not captured are allocated on stack and/or registers only. This requires doing an analysis of how a closure is used.

    Edit

    Supposing you're only considering a purely functional setting (in other words the return value of a function/closure depends only on the passed parameter and the closure state cannot mutate) then things can be simplified a little.

    What you can do is making the closure object containing the captured values instead of the captured variables and by making at the same time the closure itself a copyable object then just a stack can in theory be used (except that there is the problem that a closure can vary in size depending on how much state needs to capture), so it's not easy at least for me to imagine a reasonable stack-only based protocol for parameter passing and value returning in this case.

    Removing the variable size problem by making the closure a fixed-size object you can see how this C program can implement closures using only stack (note that there are no malloc calls)

    #include 
    
    typedef struct TClosure {
        int (*code)(struct TClosure *env, int);
        int state;
    } Closure;
    
    int call(Closure *c, int x) {
        return c->code(c, x);
    }
    
    int adder_code(Closure *env, int x) {
        return env->state + x;
    }
    
    int multiplier_code(Closure *env, int x) {
        return env->state * x;
    }
    
    Closure make_closure(int op, int k) {
        Closure c;
        c.state = k;
        c.code = (op == '+' ? adder_code : multiplier_code);
        return c;
    }
    
    int main(int argc, const char *argv[]) {
        Closure c1 = make_closure('+', 10);
        Closure c2 = make_closure('*', 3);
        printf("c1(3) = %i, c2(3) = %i\n",
               call(&c1, 3), call(&c2, 3));
        return 0;
    }
    

    Closure structs can be passed, returned and stored on stack because the environment is read-only so you don't have the lifetime problem because immutable data can be copied without affecting semantic.

    A C compiler could use such an approach to create closures that can only capture variables by value, and indeed is what C++11 lambda provide (you can capture also by reference, but it's up to the programmer to ensure that the lifetime of captured variables lasts enough).

    0 讨论(0)
提交回复
热议问题