问题
I'm experimenting with Puppeteer Cluster and I just don't understand how to use queuing properly. Can it only be used for calls where you don't wait for a response? I'm using Artillery to fire a bunch of requests simultaneously, but they all fail while only some fail when I have the command execute directly.
I've taken the code straight from the examples and replaced execute
with queue
which I expected to work, except the code doesn't wait for the result. Is there a way to achieve this anyway?
So this works:
const screen = await cluster.execute(req.query.url);
But this breaks:
const screen = await cluster.queue(req.query.url);
Here's the full example with queue
:
const express = require('express');
const app = express();
const { Cluster } = require('puppeteer-cluster');
(async () => {
const cluster = await Cluster.launch({
concurrency: Cluster.CONCURRENCY_CONTEXT,
maxConcurrency: 2,
});
await cluster.task(async ({ page, data: url }) => {
// make a screenshot
await page.goto('http://' + url);
const screen = await page.screenshot();
return screen;
});
// setup server
app.get('/', async function (req, res) {
if (!req.query.url) {
return res.end('Please specify url like this: ?url=example.com');
}
try {
const screen = await cluster.queue(req.query.url);
// respond with image
res.writeHead(200, {
'Content-Type': 'image/jpg',
'Content-Length': screen.length //variable is undefined here
});
res.end(screen);
} catch (err) {
// catch error
res.end('Error: ' + err.message);
}
});
app.listen(3000, function () {
console.log('Screenshot server listening on port 3000.');
});
})();
What am I doing wrong here? I'd really like to use queuing because without it every incoming request appears to slow down all the other ones.
回答1:
Author of puppeteer-cluster here.
Quote from the docs:
cluster.queue(..)
: [...] Be aware that this function only returns a Promise for backward compatibility reasons. This function does not run asynchronously and will immediately return.
cluster.execute(...)
: [...] Works likeCluster.queue
, just that this function returns a Promise which will be resolved after the task is executed. In case an error happens during the execution, this function will reject the Promise with the thrown error. There will be no "taskerror" event fired.
When to use which function:
- Use
cluster.queue
if you want to queue a large number of jobs (e.g. list of URLs). The task function needs to take care of storing the results by printing them to console or storing them into a database. - Use
cluster.execute
if your task function returns a result. This will still queue the job, so this is like callingqueue
in addition to waiting for the job to finish. In this scenario, there is most often a "idling cluster" present which is used when a request hits the server (like in your example code).
So, you definitely want to use cluster.execute
as you want to wait for the results of the task function. The reason, you do not see any errors is (as quoted above) that the errors of the cluster.queue
function are emitted via a taskerror
event. The cluster.execute
errors are directly thrown (Promise is rejected). Most likely, in both cases your jobs fail, but it is only visible for the cluster.execute
来源:https://stackoverflow.com/questions/57361073/puppeteer-cluster-queue-instead-of-execute