Tell urllib2 to use custom DNS

前端 未结 3 610
终归单人心
终归单人心 2020-11-30 01:39

I\'d like to tell urllib2.urlopen (or a custom opener) to use 127.0.0.1 (or ::1) to resolve addresses. I wouldn\'t ch

相关标签:
3条回答
  • 2020-11-30 01:49

    Looks like name resolution is ultimately handled by socket.create_connection.

    -> urllib2.urlopen
    -> httplib.HTTPConnection
    -> socket.create_connection
    

    Though once the "Host:" header has been set, you can resolve the host and pass on the IP address through down to the opener.

    I'd suggest that you subclass httplib.HTTPConnection, and wrap the connect method to modify self.host before passing it to socket.create_connection.

    Then subclass HTTPHandler (and HTTPSHandler) to replace the http_open method with one that passes your HTTPConnection instead of httplib's own to do_open.

    Like this:

    import urllib2
    import httplib
    import socket
    
    def MyResolver(host):
      if host == 'news.bbc.co.uk':
        return '66.102.9.104' # Google IP
      else:
        return host
    
    class MyHTTPConnection(httplib.HTTPConnection):
      def connect(self):
        self.sock = socket.create_connection((MyResolver(self.host),self.port),self.timeout)
    class MyHTTPSConnection(httplib.HTTPSConnection):
      def connect(self):
        sock = socket.create_connection((MyResolver(self.host), self.port), self.timeout)
        self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
    
    class MyHTTPHandler(urllib2.HTTPHandler):
      def http_open(self,req):
        return self.do_open(MyHTTPConnection,req)
    
    class MyHTTPSHandler(urllib2.HTTPSHandler):
      def https_open(self,req):
        return self.do_open(MyHTTPSConnection,req)
    
    opener = urllib2.build_opener(MyHTTPHandler,MyHTTPSHandler)
    urllib2.install_opener(opener)
    
    f = urllib2.urlopen('http://news.bbc.co.uk')
    data = f.read()
    from lxml import etree
    doc = etree.HTML(data)
    
    >>> print doc.xpath('//title/text()')
    ['Google']
    

    Obviously there are certificate issues if you use the HTTPS, and you'll need to fill out MyResolver...

    0 讨论(0)
  • 2020-11-30 02:01

    You will need to implement your own dns lookup client (or using dnspython as you said). The name lookup procedure in glibc is pretty complex to ensure compatibility with other non-dns name systems. There's for example no way to specify a particular DNS server in the glibc library at all.

    0 讨论(0)
  • 2020-11-30 02:14

    Another (dirty) way is monkey-patching socket.getaddrinfo.

    For example this code adds a (unlimited) cache for dns lookups.

    import socket
    prv_getaddrinfo = socket.getaddrinfo
    dns_cache = {}  # or a weakref.WeakValueDictionary()
    def new_getaddrinfo(*args):
        try:
            return dns_cache[args]
        except KeyError:
            res = prv_getaddrinfo(*args)
            dns_cache[args] = res
            return res
    socket.getaddrinfo = new_getaddrinfo
    
    0 讨论(0)
提交回复
热议问题