Rotating Proxies for web scraping

后端 未结 3 730
感动是毒
感动是毒 2021-01-30 18:50

I\'ve got a python web crawler and I want to distribute the download requests among many different proxy servers, probably running squid (though I\'m open to alternatives). For

相关标签:
3条回答
  • 2021-01-30 19:41

    I've setted up rotating proxies using HAProxy + DeleGate + Multiple Tor Instances. With Tor you don't have good control of bandwidth and latency but it's useful for web scraping. I've just published an article on the subject: Running Your Own Anonymous Rotating Proxies

    0 讨论(0)
  • 2021-01-30 19:41

    Edit: There is even Python wrapper for gimmeproxy: https://github.com/ericfourrier/gimmeproxy-api

    If you don't mind Node, you can use proxy-lists to collect public proxies and check-proxy to check them. It's exactly how https://gimmeproxy.com works, more info here

    0 讨论(0)
  • 2021-01-30 19:42

    Make your crawler have a list of proxies and with each HTTP request let it use the next proxy from the list in a round robin fashion. However, this will prevent you from using HTTP/1.1 persistent connections. Modifying the proxy list will eventually result in using a new or not using a proxy.

    Or have several connections open in parallel, one to each proxy, and distribute your crawling requests to each of the open connections. Dynamics may be implemented by having the connetor registering itself with the request dispatcher.

    0 讨论(0)
提交回复
热议问题