How to iterate through a supermarket website and getting the product name and prices?

会有一股神秘感。 提交于 2021-01-29 02:16:07

问题


Im trying to obtain all the product name and prices from all the categories from a supermarket website, all the tutorials that i have found do it just for one const url, i need to iterate through all of them. So far i have got this

const puppeteer = require('puppeteer');

async function scrapeProduct(url) {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url);

    const [el2] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/h1/div');
    const text2 = await el2.getProperty('textContent');
    const name = await text2.jsonValue();

    const [el] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/div[2]/div[1]/div[2]/p[1]/em[2]/strong/text()');
    const text = await el.getProperty('textContent');
    const price = await text.jsonValue();

    console.log({name,price});

    await browser.close();
}

scrapeProduct('https://www.jumbo.com.ar/gaseosa-sprite-sin-azucar-lima-limon-1-25-lt/p'); 

which works just for one. Im using nodejs and puppeteer. How can i achieve this?


回答1:


You can try for...of loop, using a single browser instance and a single page so that the scraper might not overload the server:

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch();
    const [page] = await browser.pages();

    const urls = [
      'https://www.jumbo.com.ar/gaseosa-sprite-sin-azucar-lima-limon-1-25-lt/p',
      // ...
    ];

    for (const url of urls) {
      await page.goto(url);

      const [el2] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/h1/div');
      const text2 = await el2.getProperty('textContent');
      const name = await text2.jsonValue();

      const [el] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/div[2]/div[1]/div[2]/p[1]/em[2]/strong/text()');
      const text = await el.getProperty('textContent');
      const price = await text.jsonValue();

      console.log({name,price});
    }

    await browser.close();
  } catch (err) {
    console.error(err);
  }
})();



回答2:


You can use an array of urls and forEach:

const puppeteer = require('puppeteer');

const urls = [ 'https://www.jumbo.com.ar/gaseosa-sprite-sin-azucar-lima-limon-1-25-lt/p' ];

urls.forEach(scrapeProduct);

async function scrapeProduct(url) {
    const browser = await puppeteer.launch();
    const page = await browser.newPage();
    await page.goto(url);

    const [el2] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/h1/div');
    const text2 = await el2.getProperty('textContent');
    const name = await text2.jsonValue();

    const [el] = await page.$x('//*[@id="product-nonfood-page"]/main/div/div/div[1]/div[1]/div/div[2]/div[2]/div[1]/div[2]/p[1]/em[2]/strong/text()');
    const text = await el.getProperty('textContent');
    const price = await text.jsonValue();

    console.log({name,price});

    await browser.close();
}


来源:https://stackoverflow.com/questions/64295559/how-to-iterate-through-a-supermarket-website-and-getting-the-product-name-and-pr

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!