In Python 3.2, I can open and read an HTTPS web page with http.client, but urllib.request is failing to open the same page

三世轮回 提交于 2019-12-04 04:06:54

What a coincidence! I'm having the same problem as you are, with an added complication: I'm behind a proxy. I found this bug report regarding https-not-working-with-urllib. Luckily, they posted a workaround.

import urllib.request
import ssl

##uncomment this code if you're behind a proxy
##https port is 443 but it doesn't work for me, used port 80 instead

##proxy_auth = '{0}://{1}:{2}@{3}'.format('https', 'username', 'password', 
##             'proxy:80')
##proxies = { 'https' : proxy_auth }
##proxy = urllib.request.ProxyHandler(proxies)
##proxy_auth_handler = urllib.request.HTTPBasicAuthHandler()
##opener = urllib.request.build_opener(proxy, proxy_auth_handler, 
##                                     https_sslv3_handler)

https_sslv3_handler = 
         urllib.request.HTTPSHandler(context=ssl.SSLContext(ssl.PROTOCOL_SSLv3))
opener = urllib.request.build_opener(https_sslv3_handler)
urllib.request.install_opener(opener)
resp = opener.open('https://yande.re/')
data = resp.read().decode('utf-8')
print(data)

Btw, thanks for showing how to use http.client. I didn't know that there's another library that can be used to connect to the internet. ;)

This is due to a bug in the early 1.x OpenSSL implementation of elliptic curve cryptography. Take a closer look at the relevant part of the exception:

_ssl.c:392: error:1411809D:SSL routines:SSL_CHECK_SERVERHELLO_TLSEXT:tls invalid ecpointformat list

This is an error from the underlying OpenSSL library code which is a result of mishandling the EC point format TLS extension. One workaround is to use the SSLv3 instead of SSLv23 method, the other workaround is to use a cipher suite specification which disables all ECC cipher suites (I had good results with ALL:-ECDH, use openssl ciphers for testing). The fix is to update OpenSSL.

The problem is due to the hostnames that your giving in the two examples:

import http.client
conn = http.client.HTTPSConnection('www.yande.re')
conn.request('GET', 'https://yande.re/')

and...

import urllib.request
urllib.request.urlopen('https://yande.re/')

Note that in the first example, you're asking the client to make a connection to the host: www.yande.re and in the second example, urllib will first parse the url 'https://yande.re' and then try a request at the host yande.re

Although www.yande.re and yande.re may resolve to the same IP address, from the perspective of the web server these are different virtual hosts. My guess is that you had an SNI configuration problem on your web server's side. Seeing as that the original question was posted on May 21, and the current cert at yande.re starts May 28, I'm thinking that you already fixed this problem?

Python

Try this:

import connection #imports connection
import url 

url = 'http://www.google.com/'    
webpage = url.open(url)

try:
    connection.receive(webpage)
except:
    webpage = url.text('This webpage is not available!')
    connection.receive(webpage)
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!