I\'m using a proxy (behind corporate firewall), to login to an https domain. The SSL handshake doesn\'t seem to be going well:
CertificateError: hostname \'ats.f
This bug in ssl.math_hostname appears in v2.7.9 (it's not in 2.7.5), and has to do with not stripping out the hostname from the hostname:port syntax. The following rewrite of ssl.match_hostname fixes the bug. Put this before your mechanize code:
import functools, re, urlparse
import ssl
old_match_hostname = ssl.match_hostname
@functools.wraps(old_match_hostname)
def match_hostname_bugfix_ssl_py_2_7_9(cert, hostname):
m = re.search(r':\d+$',hostname) # hostname:port
if m is not None:
o = urlparse.urlparse('https://' + hostname)
hostname = o.hostname
old_match_hostname(cert, hostname)
ssl.match_hostname = match_hostname_bugfix_ssl_py_2_7_9
The following mechanize code should now work:
import mechanize
import cookielib
br = mechanize.Browser()
# Cookie Jar
cj = cookielib.LWPCookieJar()
br.set_cookiejar(cj)
# Browser options
br.set_handle_equiv(True)
br.set_handle_gzip(True)
br.set_handle_redirect(True)
br.set_handle_referer(True)
br.set_handle_robots(False)
# Follows refresh 0 but not hang on refresh > 0
br.set_handle_refresh(mechanize._http.HTTPRefreshProcessor(), max_time=1)
br.addheaders = [('User-Agent', 'Nutscrape 1.0')]
# Use this proxy
br.set_proxies({"http": "localhost:3128", "https": "localhost:3128"})
r = br.open('https://www.duckduckgo.com:443/')
html = br.response().read()
# Examine the html response from a browser
f = open('foo.html','w')
f.write(html)
f.close()