发表新帖

发表新帖

Get past request limit in crawling a web site

后端未结

关注

 4  1236

盖世英雄少女心 2021-02-03 11:33

I\'m working on a web crawler that indexes sites that don\'t want to be indexed.

My first attempt: I wrote a c# crawler that goes through each and every page and downlo

4条回答

粉色の甜心 (楼主)

2021-02-03 11:48

Using proxies is, by far, the most common way to tackle this problem. There are other higher-level solutions that provide a sort of "page downloading as a service" guaranteeing you get "clean" pages (not 404s, etc). One of these is called Crawlera (provided by my company) but there may be others.

0 讨论(0)

查看其它4个回答
发布评论:

提交评论
- 加载中...

热议问题