Get past request limit in crawling a web site

后端未结

关注

 4  1252

盖世英雄少女心 2021-02-03 11:33

I\'m working on a web crawler that indexes sites that don\'t want to be indexed.

My first attempt: I wrote a c# crawler that goes through each and every page and downlo

4条回答

盖世英雄少女心 (楼主)

2021-02-03 12:01

For this case I usually use https://gimmeproxy.com which checks proxy every second.

To get working proxy, you need just to make the following request:

https://gimmeproxy.com/api/getProxy

You will get JSON response with all proxy data which you can use later as needed:

{
  "supportsHttps": true,
  "protocol": "socks5",
  "ip": "156.182.122.82:31915",
  "port": "31915",
  "get": true,
  "post": true,
  "cookies": true,
  "referer": true,
  "user-agent": true,
  "anonymityLevel": 1,
  "websites": {
    "example": true,
    "google": false,
    "amazon": true
  },
  "country": "BR",
  "tsChecked": 1517952910,
  "curl": "socks5://156.182.122.82:31915",
  "ipPort": "156.182.122.82:31915",
  "type": "socks5",
  "speed": 37.78,
  "otherProtocols": {}
}

0 讨论(0)

查看其它4个回答