Node.js + Cheerio : Request inside a loop

血红的双手。 提交于 2021-02-07 10:45:25

问题


I'm using cheerio, request and Node.js.

When I run the script below, it outputs names in a wrong order. I believe that it's caused by asynchronous nature of it, how can I make it work in the "right" order? Do I need to use a sync package or is there a way to change it in a way so it'll work in a sync way?

app.get('/returned', function (req, res) {
    for (var y = 0; y < 10; y++) {
        var url = "http://example.com" + y + "/person.html";
        request(url, function (err, resp, body) {
            $ = cheerio.load(body);
            var links = $('#container');
            var name = links.find('span[itemprop="name"]').html(); // name
            if (name == null) {
                console.log("returned null");
            } else {
                console.log(name);
            }

        });
    }
});

回答1:


Promise makes this relatively easy:

app.get('/returned', function (req, res) {
    let urls = [];
    for (let y = 0; y < 10; y++) {
        urls.push('http://example.com' + y + '/person.html');
    }
    Promise.all(urls.map(function (url) {
        return new Promise(resolve, reject) {
            request(url, function (err, resp, body) {
                if (err) {return reject(err);}
                let $ = cheerio.load(body);
                let links = $('#container');
                let name = links.find('span[itemprop="name"]').html(); // name
                resolve({name: name, links: links, url: url});
            });
        });
    }).then(function (result) {
        result.forEach(function (obj) {
            if (obj.name == null) {
                console.log(obj.url, "returned null");
            } else {
                console.log(obj.url, obj.name);
            }
        });
    }).catch(function (err) {
        console.log(err);
    });
});

I started by creating an array of urls to get, then I mapped that to an array of promises. When each of the requests are complete, i resolved the promise with the name, url, and links. When all promises were complete, I then looped over the result which will will be in the original order. This runs in parallel.




回答2:


Nope, you shouldn't have to use a sync package. IMO the cleanest way is to use a mature 3rd party library.

I'd recommend async.

The async.series method would execute all request functions in the order they are given, then allow you to register a callback to fire when all requests have been made, or when an error has occurred.

https://github.com/caolan/async#seriestasks-callback



来源:https://stackoverflow.com/questions/33506986/node-js-cheerio-request-inside-a-loop

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!