is it possible to write web crawler in javascript?

后端 未结 11 588
深忆病人
深忆病人 2021-02-01 07:48

I want to crawl the page and check for the hyperlinks in that respective page and also follow those hyperlinks and capture data from the page

11条回答
  •  执念已碎
    2021-02-01 08:39

    Generally, browser JavaScript can only crawl within the domain of its origin, because fetching pages would be done via Ajax, which is restricted by the Same-Origin Policy.

    If the page running the crawler script is on www.example.com, then that script can crawl all the pages on www.example.com, but not the pages of any other origin (unless some edge case applies, e.g., the Access-Control-Allow-Origin header is set for pages on the other server).

    If you really want to write a fully-featured crawler in browser JS, you could write a browser extension: for example, Chrome extensions are packaged Web application run with special permissions, including cross-origin Ajax. The difficulty with this approach is that you'll have to write multiple versions of the crawler if you want to support multiple browsers. (If the crawler is just for personal use, that's probably not an issue.)

提交回复
热议问题