How do I remove the stack overflow from this casperjs code (using setTimeout)?

≯℡__Kan透↙ 提交于 2019-12-23 02:10:09

问题


The following sample resembles my actual code:

function runCode() {
    casper.then(function(){
        if (condition){
            return;
        }
    });

    .... code .....
    .... code .....

    casper.then(function(){
        setTimeout(runCode(), 1000);
    });
}

function startScript() {
    .... code ....
    .... code ....

    casper.then(function(){
        runCode();
    });

    casper.then(function(){
        setTimeout(startScript(),5000);
    });
}

startScript();

This code is running on a vps and it seems to fill up all the 512 MB of RAM. It initially starts with around 50 MB RAM and in few hours goes on to fill it up. So I suspect the way I'm implementing the infinite loop is creating new stack frames without destroying the old ones.

How I want to implement this : The execution starts with startScript() and from inside the startScript() it calls another function runCode(). This runCode function has to run infinitely in a loop. I'm trying to do it using the setTimeout function.

There is a condition upon reaching which the whole script to start again so I'm using return and go back to startScript() function and then restart it with another setTimeout() function.

The specific condition I'm talking about has not been encountered in my script in the last few hours. So, I suspect the memory usage is within the runCode() function. Please give me some suggestions to remove this memory usage problem.

Update: I was sending the function's return value (which was null or undefined) as argument to the setTimeout() and for this the function had to run once and this was causing the stackoverflow. As suggested by Artjom B., I tried the following code but the function passed as argument to the setTimeout is not being invoked.

function runCode() {

    console.log("inside runcode");
    casper.then(function(){
    ...
    ...
    // call to other functions
    });

    //setTimeout(runCode, 1000); --------------- [i]

    casper.then(function(){
        console.log("just before setTimeout");
        setTimeout(runCode, 1000);
    });
}
runCode();

I get the following output:

inside runcode console.log messages from the other functions and codes in between. just before setTimeout Then it exits.

If I use the commented out code as indicated by [i] and comment out the lines after that. I get an infinite loop like this: inside runcode inside runcode inside runcode .... .... I don't know what is wrong. Please suggest me something.

Update 2: Thank you Artjom B. for picking up another flaw in my code. There seems to be a problem with the setTimeout() function. When I run the code in this paste: http://pastebin.com/W9DD6YpB, it doesn't seem to run infinitely as supposed.

Update 3: As explained by Artjom B., the asynchronous nature of javascript is causing casper to think there is no more code left to execute so it is exiting before the function queued by setTimeout gets invoked. I'm wondering if adding some code after will make casper not exit. For example, function queued by setTimeout() waits for 1000ms to be invoked. So, a casper.wait(2000) should do the work but I don't know if there will still be stack overflow problems: http://pastebin.com/ybKWH5KX


回答1:


After some discussion in the comments, it was made clear that an approach with setTimeout doesn't work or is rather hard to read and maintain.

Stack frames

Your concern for uncollected stack frames from recursive calling of runCode and startScript is ungrounded since CasperJS internally works with setTimeout. So you should use the functions that are provided by CasperJS.

You can do this recursively (nesting of steps), because CasperJS handles this well using a queue and inserting new steps after the current executed step.

Stop condition

You would need to move the stop condition to the recursive call, because in such an asynchronous code this

function runCode() {
    casper.then(function(){
        if (condition){
            return;
        }
    });
    //...
}

doesn't actually stop runCode execution, because it just returns from the function inside of the then block.

Replace setTimeout

You would then replace setTimeout in:

function runCode() {
    //...
    casper.then(function(){
        if (!condition){
            setTimeout(runCode, 1000);
        }
    });
}

with the proper casper functions:

function runCode() {
    //...
    casper.wait(1000);
    casper.then(function(){
        if (!condition){
            runCode();
        }
    });
}

You need to do the same replacement in startScript from this:

casper.then(function(){
    setTimeout(startScript,5000);
});

to

casper.wait(5000);
casper.then(function(){
    startScript();
});

On keeping setTimeout

If you really want to keep setTimeout then you would need to do double bookkeeping. By calling a function with setTimeout you break out of the controlled flow of casper steps.

For example, you may do something like this:

function someFunction(){
    casper.then(function(){
        // something
    });
}
casper.start(url);
casper.then(function(){
    setTimeout(someFunction, 5000);
});
casper.run();

The function inside then is actually the last scheduled step. When it is executed it will create a timer to then start a function which in turn will add more steps to the flow. This will never happen, because casper has no way of knowing if there will be more steps scheduled and since there currently aren't (at the end of the then before run), it will simply exit the complete script. Although on some platforms the underlying phantomjs might behave differently. setTimeout lets you break out of the control flow. This might not be good as in this case.

To gain control back you may do the following as indicated in your paste:

function someFunction(){
    casper.then(function(){
        // something
    });
}
casper.start(url);
casper.then(function(){
    setTimeout(someFunction, 5000);
});
casper.wait(5100); // should be greater than the previous timeout
casper.run();

^ Do not do this. It is hard to read and error-prone. This can be simplified to:

casper.start(url);
casper.then(function(){
    // something
});
casper.wait(5000, someFunction); // added bonus because "this" now refers to casper
casper.run();

Proper callback invocation for setTimeout

You also have a syntactic problem with the actual invocation of the function in setTimeout. The main problem is that you don't actually use setTimeout. See for example the line

setTimeout(startScript(),5000);

Here you invoke the startScript function without delay, because of () and pass the return value into the setTimeout function. I don't think you actually return anything from startScript. setTimeout will take the undefined without issuing a warning or error, but can't execute it after the timeout, because it isn't actually a function. In javascript functions are first class citizens. You can pass the function object into other functions.

You can fix this by removing () from the above line:

setTimeout(startScript,5000);

The same goes for

setTimeout(runCode, 1000);

(untested) Solution for removing previous casper steps

You really should run the script from cron without the recursion or something like that. If you really don't want that, you still may be able to reduce the memory consumption.

The steps that are scheduled via then*, wait* and some other are managed in the internal casper.steps property. They are not cleared once they are executed. So that may be the reason of your memory leak. You may try to clear them like this:

casper.clearSomeSteps = function(min, keep){
    var len = casper.steps.length;
    min = min || 1000; // only run when at least 1000 steps are scheduled
    keep = keep || 100; // keep 100 of the newer steps
    if (len < min) return; // not yet needed

    this.step -= len-keep; // change the index of the current step
    this.steps = Array.prototype.slice.call(this.steps, len-keep); // do the slice
};

Call this.clearSomeSteps() at the beginning of startScript. Although this might not be the whole solution as there are also casper.waiters.



来源:https://stackoverflow.com/questions/24637045/how-do-i-remove-the-stack-overflow-from-this-casperjs-code-using-settimeout

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!