问题
I'm trying to scrape all URLs from a website and put them into an array. I have a question about an array index. If I add an index number like 2 into array[2], the command line replies with "undefined". If I remove the index and print the whole array, it prints all the URLs line by line. I want each URL to be its own index like:
- array[0] = First URL found
- array[1] = Second URL found
- array[2] = Thirs URL found etc.
Can anyone point me in the right direction? Thank you.
var request = require('request');
var cheerio = require('cheerio');
var url = 'http://www.hobo-web.co.uk/';
request(url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a'); //use your CSS selector here
$(links).each(function(i, link){
var array = $(link).attr('href');
console.log(array[2]);
});
});``
回答1:
You need to initially create the array as a variable accessible within the .each
loop, then keep pushing the href values to it.
var request = require('request');
var cheerio = require('cheerio');
var url = 'http://www.hobo-web.co.uk/';
var array = [];
request(url, function(err, resp, body){
$ = cheerio.load(body);
links = $('a');
$(links).each(function(i, link){
var href = $(link).attr('href');
array.push(href);
});
});
来源:https://stackoverflow.com/questions/42940845/scraping-urls-from-a-web-page-with-node-js