I open urls with:
site = urllib2.urlopen(\'http://google.com\')
And what I want to do is connect the same way with a proxy I got somewhere telli
You have to install a ProxyHandler
urllib2.install_opener(
urllib2.build_opener(
urllib2.ProxyHandler({'http': '127.0.0.1'})
)
)
urllib2.urlopen('http://www.google.com')
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
You can set proxies using environment variables.
import os
os.environ['http_proxy'] = '127.0.0.1'
os.environ['https_proxy'] = '127.0.0.1'
urllib2
will add proxy handlers automatically this way. You need to set proxies for different protocols separately otherwise they will fail (in terms of not going through proxy), see below.
For example:
proxy = urllib2.ProxyHandler({'http': '127.0.0.1'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
# next line will fail (will not go through the proxy) (https)
urllib2.urlopen('https://www.google.com')
Instead
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way both http and https requests go through the proxy
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
In Addition to the accepted answer: My scipt gave me an error
File "c:\Python23\lib\urllib2.py", line 580, in proxy_open
if '@' in host:
TypeError: iterable argument required
Solution was to add http:// in front of the proxy string:
proxy = urllib2.ProxyHandler({'http': 'http://proxy.xy.z:8080'})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
urllib2.urlopen('http://www.google.com')
One can also use requests if we would like to access a web page using proxies. Python 3 code:
>>> import requests
>>> url = 'http://www.google.com'
>>> proxy = '169.50.87.252:80'
>>> requests.get(url, proxies={"http":proxy})
<Response [200]>
More than one proxies can also be added.
>>> proxy1 = '169.50.87.252:80'
>>> proxy2 = '89.34.97.132:8080'
>>> requests.get(url, proxies={"http":proxy1,"http":proxy2})
<Response [200]>
To use the default system proxies (e.g. from the http_support environment variable), the following works for the current request (without installing it into urllib2 globally):
url = 'http://www.example.com/'
proxy = urllib2.ProxyHandler()
opener = urllib2.build_opener(proxy)
in_ = opener.open(url)
in_.read()