问题
I use python 2.6 and request Facebook API (https). I guess my service could be target of Man In The Middle attacks. I discovered this morning reading again urllib module documentation that : Citation:
Warning : When opening HTTPS URLs, it is not attempted to validate the server certificate. Use at your own risk!
Do you have hints / url / examples to complete a full certificate validation ?
Thanks for your help
回答1:
You could create a urllib2 opener which can do the validation for you using a custom handler. The following code is an example that works with Python 2.7.3 . It assumes you have downloaded http://curl.haxx.se/ca/cacert.pem to the same folder where the script is saved.
#!/usr/bin/env python
import urllib2
import httplib
import ssl
import socket
import os
CERT_FILE = os.path.join(os.path.dirname(__file__), 'cacert.pem')
class ValidHTTPSConnection(httplib.HTTPConnection):
"This class allows communication via SSL."
default_port = httplib.HTTPS_PORT
def __init__(self, *args, **kwargs):
httplib.HTTPConnection.__init__(self, *args, **kwargs)
def connect(self):
"Connect to a host on a given (SSL) port."
sock = socket.create_connection((self.host, self.port),
self.timeout, self.source_address)
if self._tunnel_host:
self.sock = sock
self._tunnel()
self.sock = ssl.wrap_socket(sock,
ca_certs=CERT_FILE,
cert_reqs=ssl.CERT_REQUIRED)
class ValidHTTPSHandler(urllib2.HTTPSHandler):
def https_open(self, req):
return self.do_open(ValidHTTPSConnection, req)
opener = urllib2.build_opener(ValidHTTPSHandler)
def test_access(url):
print "Acessing", url
page = opener.open(url)
print page.info()
data = page.read()
print "First 100 bytes:", data[0:100]
print "Done accesing", url
print ""
# This should work
test_access("https://www.google.com")
# Accessing a page with a self signed certificate should not work
# At the time of writing, the following page uses a self signed certificate
test_access("https://tidia.ita.br/")
Running this script you should see something a output like this:
Acessing https://www.google.com
Date: Mon, 14 Jan 2013 14:19:03 GMT
Expires: -1
...
First 100 bytes: <!doctype html><html itemscope="itemscope" itemtype="http://schema.org/WebPage"><head><meta itemprop
Done accesing https://www.google.com
Acessing https://tidia.ita.br/
Traceback (most recent call last):
File "https_validation.py", line 54, in <module>
test_access("https://tidia.ita.br/")
File "https_validation.py", line 42, in test_access
page = opener.open(url)
...
File "/usr/local/Cellar/python/2.7.3/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 1177, in do_open
raise URLError(err)
urllib2.URLError: <urlopen error [Errno 1] _ssl.c:504: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed>
回答2:
If you have a trusted Certificate Authority (CA) file, you can use Python 2.6 and later's ssl
library to validate the certificate. Here's some code:
import os.path
import ssl
import sys
import urlparse
import urllib
def get_ca_path():
'''Download the Mozilla CA file cached by the cURL project.
If you have a trusted CA file from your OS, return the path
to that instead.
'''
cafile_local = 'cacert.pem'
cafile_remote = 'http://curl.haxx.se/ca/cacert.pem'
if not os.path.isfile(cafile_local):
print >> sys.stderr, "Downloading %s from %s" % (
cafile_local, cafile_remote)
urllib.urlretrieve(cafile_remote, cafile_local)
return cafile_local
def check_ssl(hostname, port=443):
'''Check that an SSL certificate is valid.'''
print >> sys.stderr, "Validating SSL cert at %s:%d" % (
hostname, port)
cafile_local = get_ca_path()
try:
server_cert = ssl.get_server_certificate((hostname, port),
ca_certs=cafile_local)
except ssl.SSLError:
print >> sys.stderr, "SSL cert at %s:%d is invalid!" % (
hostname, port)
raise
class CheckedSSLUrlOpener(urllib.FancyURLopener):
'''A URL opener that checks that SSL certificates are valid
On SSL error, it will raise ssl.
'''
def open(self, fullurl, data = None):
urlbits = urlparse.urlparse(fullurl)
if urlbits.scheme == 'https':
if ':' in urlbits.netloc:
hostname, port = urlbits.netloc.split(':')
else:
hostname = urlbits.netloc
if urlbits.port is None:
port = 443
else:
port = urlbits.port
check_ssl(hostname, port)
return urllib.FancyURLopener.open(self, fullurl, data)
# Plain usage - can probably do once per day
check_ssl('www.facebook.com')
# URL Opener
opener = CheckedSSLUrlOpener()
opener.open('https://www.facebook.com/find-friends/browser/')
# Make it the default
urllib._urlopener = opener
urllib.urlopen('https://www.facebook.com/find-friends/browser/')
Some dangers with this code:
- You have to trust the CA file from the cURL project (http://curl.haxx.se/ca/cacert.pem), which is a cached version of Mozilla's CA file. It's also over HTTP, so there is a potential MITM attack. It's better to replace
get_ca_path
with one that returns your local CA file, which will vary from host to host. - There is no attempt to see if the CA file has been updated. Eventually, root certs will expire or be deactivated, and new ones will be added. A good idea would be to use a cron job to delete the cached CA file, so that a new one is downloaded daily.
- It's probably overkill to check certificates every time. You could manually check once per run, or keep a list of 'known good' hosts over the course of the run. Or, be paranoid!
来源:https://stackoverflow.com/questions/6648952/urllib-and-validation-of-server-certificate