Get past request limit in crawling a web site

后端 未结 4 1233
盖世英雄少女心
盖世英雄少女心 2021-02-03 11:33

I\'m working on a web crawler that indexes sites that don\'t want to be indexed.

My first attempt: I wrote a c# crawler that goes through each and every page and downlo

4条回答
  •  一向
    一向 (楼主)
    2021-02-03 11:54

    Whenever I have to pass the requests limit of the pages that I'm crawling, I usually do it with proxycrawl as it's the fastest way to go. You don't have to care about anything, infrastructure, ips, being blocked etc...

    They have a simple API which you can call as frequent as you want and they will always return you a valid response skipping the limits.

    https://api.proxycrawl.com?url=https://somesite.com
    

    So far I've been using it for some months and works great. They even have a free plan.

提交回复
热议问题