Get past request limit in crawling a web site

后端 未结 4 1252
盖世英雄少女心
盖世英雄少女心 2021-02-03 11:33

I\'m working on a web crawler that indexes sites that don\'t want to be indexed.

My first attempt: I wrote a c# crawler that goes through each and every page and downlo

4条回答
  •  盖世英雄少女心
    2021-02-03 12:01

    For this case I usually use https://gimmeproxy.com which checks proxy every second.

    To get working proxy, you need just to make the following request:

    https://gimmeproxy.com/api/getProxy
    

    You will get JSON response with all proxy data which you can use later as needed:

    {
      "supportsHttps": true,
      "protocol": "socks5",
      "ip": "156.182.122.82:31915",
      "port": "31915",
      "get": true,
      "post": true,
      "cookies": true,
      "referer": true,
      "user-agent": true,
      "anonymityLevel": 1,
      "websites": {
        "example": true,
        "google": false,
        "amazon": true
      },
      "country": "BR",
      "tsChecked": 1517952910,
      "curl": "socks5://156.182.122.82:31915",
      "ipPort": "156.182.122.82:31915",
      "type": "socks5",
      "speed": 37.78,
      "otherProtocols": {}
    }
    

提交回复
热议问题