How would one implement Lazy Evaluation in C?

后端 未结 9 1909
野的像风
野的像风 2021-02-02 13:24

Take for example,

The follow python code:

def multiples_of_2():
  i = 0
  while True:
    i = i + 2
    yield i

How do we translate thi

相关标签:
9条回答
  • 2021-02-02 13:48

    As Will mentioned, languages like python do the job of storing the state of the stack between successive calls of the generator. Since C does not have this mechanism, you'll have to do it yourself. The "generic" way of doing this is not for the faint-hearted, as Greg pointed out. The traditional C way of doing this would be for you to define and maintain the state yourself and pass it in and out of your method. So:

    struct multiples_of_two_state {
           int i;
           /* all the state you need should go in here */
    };
    
    void multiples_of_two_init(struct multiples_of_two_state *s) {
        s->i = 0;
    }
    
    int multiples_of_two_next(struct multiples_of_two_state *s) {
        s->i += 2;
        return s->i;
    }
    
    /* Usage */
    struct multiples_of_two_state s;
    int result;
    multiples_of_two_init(&s);
    for (int i=0; i<INFINITY; i++) {
        result = multiples_of_two_next(&s);
        printf("next is %d", result);
    }
    
    0 讨论(0)
  • 2021-02-02 13:53

    Check out setjmp/longjmp

    setjmp.h is a header defined in the C standard library to provide "non-local jumps," or control flow besides the usual subroutine call and return sequence. The paired functions setjmp and longjmp provide this functionality. First setjmp saves the environment of the calling function into a data structure, and then longjmp can use this structure to "jump" back to the point it was created, at the setjmp call.

    (Lua coroutines were implemented that way)

    0 讨论(0)
  • 2021-02-02 13:55

    I found a good article recently on coroutines in C, which describes one method of doing this. It's certainly not for the faint of heart.

    0 讨论(0)
  • 2021-02-02 13:55

    The key is keeping the state of the function between calls. You have a number of options:

    1. Static (or global) state. Means the sequence of calls to the function is not reentrant, i.e. you can't have the function call itself recursively, nor can you have more than one caller running different sequences of calls.

    2. Initialising (and possibly allocating) the state on or before the first call, and passing that to the function on each subsequent call.

    3. Doing clever stuff with setjmp/longjmp, the stack, or modifiable code (there's an article somewhere about currying functions in C that creates an object with the necessary code to call the curried function; a similar technique could create an object with the function's state and the necessary code to save and restore it for each call). (Edit Found it -- http://asg.unige.ch/site/papers/Dami91a.pdf)

    Greg cites an interesting article, above, that presents a way of using static state with syntax similar to the yield statement. I liked it academically but probably wouldn't use it because of the reentrancy issue, and because I'm still surprised that the infamous Duffy's Device even compiles ;-).

    In practice, large C programs do want to compute things lazily, e.g. a database server may want to satisfy a SELECT ... LIMIT 10 query by wrapping the plain SELECT query inside something that will yield each row until 10 have been returned, rather than computing the whole result and then discarding most of them. The most C-like technique for this is explicitly create an object for the state, and to call a function with it for each call. For your example, you might see something like:

    /* Definitions in a library somewhere. */
    typedef int M2_STATE;
    M2_STATE m2_new() { return 0; }
    int m2_empty(M2_STATE s) { return s < INT_MAX; }
    int m2_next(M2_STATE s) { int orig_s = s; s = s + 2; return orig_s; }
    
    /* Caller. */
    M2_STATE s;
    s = m2_new();
    while (!m2_empty(s))
    {
        int num = m2_next(s);
        printf("%d\n", num);
    }
    

    This seems cumbersome for the multiples of two, but it becomes a useful pattern for more complicated generators. You can make the state more complicated without having to burden all the code that uses your generator with the details. Even better practice is to return an opaque pointer in the new function, and (unless GC is available) provide a function for cleaning up the generator.

    The big advantage of allocating the state for each new sequence of calls is things like recursive generators. For example, a generator that returns all files under a directory, by calling itself on each subdirectory.

    char *walk_next(WALK_STATE *s)
    {
        if (s->subgenerator)
        {
            if (walk_is_empty(s->subgenerator))
            {
                walk_finish(s->subgenerator);
                s->subgenerator = NULL;
            }
            else
                return walk_next(s->subgenerator);
        }
    
        char *name = readdir(s->dir);
        if (is_file(name))
            return name;
        else if (is_dir(name))
        {
            char subpath[MAX_PATH];
            strcpy(subpath, s->path);
            strcat(subpath, name);
            s->subgenerator = walk_new(subpath);
            return walk_next(s->subgenerator);
        }
        closedir(s->dir);
        s->empty = 1;
        return NULL;
    }
    

    (You'll have to excuse my misuse of readdir, et al. and my pretense that C has idiot-proof string support.)

    0 讨论(0)
  • 2021-02-02 13:57

    The basic approach is to not do it. In Python (and C#) the 'yield' method stores local state between calls, whereas in C/C++ and most other languages the local state stored on the stack is not preserved between calls and this is a fundemental implementation difference. So in C you'd have to store the state between calls in some variable explicitly - either a global variable or a function parameter to your sequence generator. So either:

    int multiples_of_2() {
       static int i = 0;
       i += 2;
       return i;
    }
    

    or

    int multiples_of_2(int i) {
       i += 2;
       return i;
    }
    

    depending upon if there's one global sequence or many.

    I've quickly considered longjmp and GCC computed gotos and other non-standard things, and I can't say I'd recommend any of them for this! In C, do it the C way.

    0 讨论(0)
  • 2021-02-02 13:57

    You can pass the argument as a pointer to allow the function to modify it without using the return value:

    void multiples_of_2(int *i)
    {
        *i += 2;
    }
    

    And call it:

    int i = 0;
    multiples_of_2(&i);
    
    0 讨论(0)
提交回复
热议问题