Unable to use https proxy within urllib.request

放肆的年华 提交于 2020-01-14 03:38:17

问题


I've created a script in python using urllib.request applying https proxy within it. I've tried like the following but it encounters different types of issues, as in urllib.error.URLError: <urlopen error [WinError 10060] A connection attempt failed----. The script is supposed to grab the ip address from that site. The ip address used in the script is a placeholder. I've complied with the suggestion made here.

First attempt:

import urllib.request
from bs4 import BeautifulSoup

url = 'https://whatismyipaddress.com/proxy-check'

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy_host = '60.191.11.246:3128'

req = urllib.request.Request(url,headers=headers)
req.set_proxy(proxy_host, 'https')
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)

Another way (using os.environ):

headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'}
proxy = '60.191.11.246:3128'

os.environ["https_proxy"] = f'http://{proxy}'
req = urllib.request.Request(url,headers=headers)
resp = urllib.request.urlopen(req).read()
soup = BeautifulSoup(resp,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)

One more approach that I've tried with:

agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
proxy_host = '205.158.57.2:53281'
proxy = {'https': f'http://{proxy_host}'}

proxy_support = urllib.request.ProxyHandler(proxy)
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
opener.addheaders = [('User-agent', agent)]
res = opener.open(url).read()

soup = BeautifulSoup(res,"html5lib")
ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
print(ip_addr)

How can I use https proxy within urllib.request in the right way?


回答1:


While we were testing the proxes, there was unusual traffic from your computer network for Google services and that was the reason of response error, because whatismyipaddress uses Google's services. But the issue was not affect other sites like stackoverflow.

from urllib import request
from bs4 import BeautifulSoup

url = 'https://whatismyipaddress.com/proxy-check'

proxies = {
    # 'https': 'https://167.172.229.86:8080',
    # 'https': 'https://51.91.137.248:3128',
    'https': 'https://118.70.144.77:3128',
}

user_agent = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36'
headers = {
    'User-Agent': user_agent,
    'accept-language': 'ru,en-US;q=0.9,en;q=0.8,tr;q=0.7'
}

proxy_support = request.ProxyHandler(proxies)
opener = request.build_opener(proxy_support)
# opener.addheaders = [('User-Agent', user_agent)]
request.install_opener(opener)

req = request.Request(url, headers=headers)
try:
    response = request.urlopen(req).read()
    soup = BeautifulSoup(response, "html5lib")
    ip_addr = soup.select_one("td:contains('IP')").find_next('td').text
    print(ip_addr)
except Exception as e:
    print(e)


来源:https://stackoverflow.com/questions/59594692/unable-to-use-https-proxy-within-urllib-request

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!