is it possible to write web crawler in javascript?

后端 未结 11 581
深忆病人
深忆病人 2021-02-01 07:48

I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page

11条回答
  •  一整个雨季
    2021-02-01 08:35

    Google's Chrome team has released puppeteer on August 2017, a node library which provides a high-level API for both headless and non-headless Chrome (headless Chrome being available since 59).

    It uses an embedded version of Chromium, so it is guaranteed to work out of the box. If you want to use an specific Chrome version, you can do so by launching puppeteer with an executable path as parameter, such as:

    const browser = await puppeteer.launch({executablePath: '/path/to/Chrome'});
    

    An example of navigating to a webpage and taking a screenshot out of it shows how simple it is (taken from the GitHub page):

    const puppeteer = require('puppeteer');
    
    (async () => {
      const browser = await puppeteer.launch();
      const page = await browser.newPage();
      await page.goto('https://example.com');
      await page.screenshot({path: 'example.png'});
    
      await browser.close();
    })();
    

提交回复
热议问题