问题
I've created a script in python using urllib.request
applying https
proxy within it. I've tried like the following but it encounters different types of issues, as in urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed----
. The script is supposed to grab the ip address from that site. The ip address used in the script is a placeholder. I've complied with the suggestion made here.
First attempt:
import urllib.request
from bs4 import BeautifulSoup
url = 'https://whatismyipaddress.com/proxy-check'
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy_host = '60.191.11.246:3128'
req = urllib.request.Request(url,headers=headers)
req.set_proxy(proxy_host, 'https')
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
Another way (using os.environ
):
headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy = '60.191.11.246:3128'
os.environ["https_proxy"] = f'http://{proxy}'
req = urllib.request.Request(url,headers=headers)
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
One more approach that I've tried with:
agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
proxy_host = '205.158.57.2:53281'
proxy = {'https': f'http://{proxy_host}'}
proxy_support = urllib.request.ProxyHandler(proxy)
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
opener.addheaders = [('User-agent', agent)]
res = opener.open(url).read()
soup = BeautifulSoup(res,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
How can I use https proxy within urllib.request in the right way?
回答1:
While we were testing the proxes, there was unusual traffic from your computer network for Google services and that was the reason of response error, because whatismyipaddress uses Google's services. But the issue was not affect other sites like stackoverflow.
from urllib import request
from bs4 import BeautifulSoup
url = 'https://whatismyipaddress.com/proxy-check'
proxies = {
# 'https': 'https://167.172.229.86:8080',
# 'https': 'https://51.91.137.248:3128',
'https': 'https://118.70.144.77:3128',
}
user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
headers = {
'User-Agent': user_agent,
'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7'
}
proxy_support = request.ProxyHandler(proxies)
opener = request.build_opener(proxy_support)
# opener.addheaders = [('User-Agent', user_agent)]
request.install_opener(opener)
req = request.Request(url, headers=headers)
try:
response = request.urlopen(req).read()
soup = BeautifulSoup(response, "html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)
except Exception as e:
print(e)
来源:https://stackoverflow.com/questions/59594692/unable-to-use-https-proxy-within-urllib-request