问题
The following sample resembles my actual code:
function runCode() {
casper.then(function(){
if (condition){
return;
}
});
.... code .....
.... code .....
casper.then(function(){
setTimeout(runCode(), 1000);
});
}
function startScript() {
.... code ....
.... code ....
casper.then(function(){
runCode();
});
casper.then(function(){
setTimeout(startScript(),5000);
});
}
startScript();
This code is running on a vps and it seems to fill up all the 512 MB of RAM. It initially starts with around 50 MB RAM and in few hours goes on to fill it up. So I suspect the way I'm implementing the infinite loop is creating new stack frames without destroying the old ones.
How I want to implement this : The execution starts with startScript()
and from inside the startScript()
it calls another function runCode()
. This runCode
function has to run infinitely in a loop. I'm trying to do it using the setTimeout
function.
There is a condition upon reaching which the whole script to start again so I'm using return and go back to startScript()
function and then restart it with another setTimeout()
function.
The specific condition I'm talking about has not been encountered in my script in the last few hours. So, I suspect the memory usage is within the runCode()
function. Please give me some suggestions to remove this memory usage problem.
Update:
I was sending the function's return value (which was null or undefined) as argument to the setTimeout()
and for this the function had to run once and this was causing the stackoverflow. As suggested by Artjom B., I tried the following code but the function passed as argument to the setTimeout is not being invoked.
function runCode() {
console.log("inside runcode");
casper.then(function(){
...
...
// call to other functions
});
//setTimeout(runCode, 1000); --------------- [i]
casper.then(function(){
console.log("just before setTimeout");
setTimeout(runCode, 1000);
});
}
runCode();
I get the following output:
inside runcode
console.log messages from the other functions and codes in between.
just before setTimeout
Then it exits.
If I use the commented out code as indicated by [i] and comment out the lines after that. I get an infinite loop like this:
inside runcode
inside runcode
inside runcode
....
....
I don't know what is wrong. Please suggest me something.
Update 2: Thank you Artjom B. for picking up another flaw in my code.
There seems to be a problem with the setTimeout()
function. When I run the code in this paste: http://pastebin.com/W9DD6YpB, it doesn't seem to run infinitely as supposed.
Update 3: As explained by Artjom B., the asynchronous nature of javascript is causing casper to think there is no more code left to execute so it is exiting before the function queued by setTimeout gets invoked.
I'm wondering if adding some code after will make casper not exit. For example, function queued by setTimeout()
waits for 1000ms to be invoked. So, a casper.wait(2000)
should do the work but I don't know if there will still be stack overflow problems: http://pastebin.com/ybKWH5KX
回答1:
After some discussion in the comments, it was made clear that an approach with setTimeout
doesn't work or is rather hard to read and maintain.
Stack frames
Your concern for uncollected stack frames from recursive calling of runCode
and startScript
is ungrounded since CasperJS internally works with setTimeout
. So you should use the functions that are provided by CasperJS.
You can do this recursively (nesting of steps), because CasperJS handles this well using a queue and inserting new steps after the current executed step.
Stop condition
You would need to move the stop condition to the recursive call, because in such an asynchronous code this
function runCode() {
casper.then(function(){
if (condition){
return;
}
});
//...
}
doesn't actually stop runCode
execution, because it just returns from the function inside of the then
block.
Replace setTimeout
You would then replace setTimeout
in:
function runCode() {
//...
casper.then(function(){
if (!condition){
setTimeout(runCode, 1000);
}
});
}
with the proper casper functions:
function runCode() {
//...
casper.wait(1000);
casper.then(function(){
if (!condition){
runCode();
}
});
}
You need to do the same replacement in startScript
from this:
casper.then(function(){
setTimeout(startScript,5000);
});
to
casper.wait(5000);
casper.then(function(){
startScript();
});
On keeping setTimeout
If you really want to keep setTimeout
then you would need to do double bookkeeping. By calling a function with setTimeout
you break out of the controlled flow of casper steps.
For example, you may do something like this:
function someFunction(){
casper.then(function(){
// something
});
}
casper.start(url);
casper.then(function(){
setTimeout(someFunction, 5000);
});
casper.run();
The function inside then
is actually the last scheduled step. When it is executed it will create a timer to then start a function which in turn will add more steps to the flow. This will never happen, because casper has no way of knowing if there will be more steps scheduled and since there currently aren't (at the end of the then
before run
), it will simply exit the complete script. Although on some platforms the underlying phantomjs might behave differently. setTimeout
lets you break out of the control flow. This might not be good as in this case.
To gain control back you may do the following as indicated in your paste:
function someFunction(){
casper.then(function(){
// something
});
}
casper.start(url);
casper.then(function(){
setTimeout(someFunction, 5000);
});
casper.wait(5100); // should be greater than the previous timeout
casper.run();
^ Do not do this. It is hard to read and error-prone. This can be simplified to:
casper.start(url);
casper.then(function(){
// something
});
casper.wait(5000, someFunction); // added bonus because "this" now refers to casper
casper.run();
Proper callback invocation for setTimeout
You also have a syntactic problem with the actual invocation of the function in setTimeout
. The main problem is that you don't actually use setTimeout
. See for example the line
setTimeout(startScript(),5000);
Here you invoke the startScript
function without delay, because of ()
and pass the return value into the setTimeout
function. I don't think you actually return anything from startScript
. setTimeout
will take the undefined
without issuing a warning or error, but can't execute it after the timeout, because it isn't actually a function. In javascript functions are first class citizens. You can pass the function object into other functions.
You can fix this by removing ()
from the above line:
setTimeout(startScript,5000);
The same goes for
setTimeout(runCode, 1000);
(untested) Solution for removing previous casper steps
You really should run the script from cron without the recursion or something like that. If you really don't want that, you still may be able to reduce the memory consumption.
The steps that are scheduled via then*
, wait*
and some other are managed in the internal casper.steps
property. They are not cleared once they are executed. So that may be the reason of your memory leak. You may try to clear them like this:
casper.clearSomeSteps = function(min, keep){
var len = casper.steps.length;
min = min || 1000; // only run when at least 1000 steps are scheduled
keep = keep || 100; // keep 100 of the newer steps
if (len < min) return; // not yet needed
this.step -= len-keep; // change the index of the current step
this.steps = Array.prototype.slice.call(this.steps, len-keep); // do the slice
};
Call this.clearSomeSteps()
at the beginning of startScript
. Although this might not be the whole solution as there are also casper.waiters.
来源:https://stackoverflow.com/questions/24637045/how-do-i-remove-the-stack-overflow-from-this-casperjs-code-using-settimeout