Python urllib2 force IPv4

前端 未结 2 1899
夕颜
夕颜 2021-01-18 06:55

I am running a script using python that uses urllib2 to grab data from a weather api and display it on screen. I have had the problem that when I query the server, I get a \

相关标签:
2条回答
  • 2021-01-18 07:20

    Not a proper answer but an alternative: call curl?

    import subprocess
    import sys
    
    def log_error(msg):
        sys.stderr.write(msg + '\n')
    
    def curl(url):
        process = subprocess.Popen(
            ["curl", "-fsSkL4", url],
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        stdout, stderr = process.communicate()
        if process.returncode == 0:
            return stdout
        else:
            log_error("Failed to fetch: %s" % url)
            log_error(stderr)
            exit(3)
    
    0 讨论(0)
  • 2021-01-18 07:38

    Not directly, no.

    So, what can you do?


    One possibility is to explicitly resolve the hostname to IPv4 yourself, and then use the IPv4 address instead of the name as the host. For example:

    host = socket.gethostbyname('example.com')
    page = urllib2.urlopen('http://{}/path'.format(host))
    

    However, some virtual-server sites may require a Host: example.com header, and they will instead get a Host: 93.184.216.119. You can work around that by overriding the header:

    host = socket.gethostbyname('example.com')
    request = urllib2.Request('http://{}/path'.format(host),
                              headers = {'Host': 'example.com'})
    page = urllib2.urlopen(request)
    

    Alternatively, you can provide your own handlers in place of the standard ones. But the standard handler is mostly just a wrapper around httplib.HTTPConnection, and the real problem is in HTTPConnection.connect.

    So, the clean way to do this is to create your own subclass of httplib.HTTPConnection, which overrides connect like this:

    def connect(self):
        host = socket.gethostbyname(self.host)
        self.sock = socket.create_connection((host, self.post),
                                             self.timeout, self.source_address)
        if self._tunnel_host:
            self._tunnel()
    

    Then create your own subclass of urllib2.HTTPHandler that overrides http_open to use your subclass:

    def http_open(self, req):
        return self.do_open(my wrapper.MyHTTPConnection, req)
    

    … and similarly for HTTPSHandler, and then hook up all the stuff properly as shown in the urllib2 docs.

    The quick & dirty way to do the same thing is to just monkeypatch httplib.HTTPConnection.connect to the above function.


    Finally, you could use a different library instead of urllib2. From what I remember, requests doesn't make this any easier (ultimately, you have to override or monkeypatch slightly different methods, but it's effectively the same). However, any libcurl wrapper will allow you to do the equivalent of curl_easy_setopt(h, CURLOPT_IPRESOLVE, CURLOPT_IPRESOLVE_V4).

    0 讨论(0)
提交回复
热议问题