I\'m working on a web crawler that indexes sites that don\'t want to be indexed.
My first attempt: I wrote a c# crawler that goes through each and every page and downlo
Whenever I have to pass the requests limit of the pages that I'm crawling, I usually do it with proxycrawl as it's the fastest way to go. You don't have to care about anything, infrastructure, ips, being blocked etc...
They have a simple API which you can call as frequent as you want and they will always return you a valid response skipping the limits.
https://api.proxycrawl.com?url=https://somesite.com
So far I've been using it for some months and works great. They even have a free plan.