Get past request limit in crawling a web site

后端 未结 4 1236
盖世英雄少女心
盖世英雄少女心 2021-02-03 11:33

I\'m working on a web crawler that indexes sites that don\'t want to be indexed.

My first attempt: I wrote a c# crawler that goes through each and every page and downlo

4条回答
  •  粉色の甜心
    2021-02-03 11:48

    Using proxies is, by far, the most common way to tackle this problem. There are other higher-level solutions that provide a sort of "page downloading as a service" guaranteeing you get "clean" pages (not 404s, etc). One of these is called Crawlera (provided by my company) but there may be others.

提交回复
热议问题