Python urllib2: Cannot assign requested address

前端 未结 3 1934
眼角桃花
眼角桃花 2021-01-07 13:28

I am sending thousands of requests using urllib2 with proxies. I have received many of the following error on execution:

urlopen error [Errno 9         


        
相关标签:
3条回答
  • 2021-01-07 13:58

    I have had a similar issue but was using POST command using python's request library though!! To make it worse, I used multiprocessing over each executor to post to a server. So thousands of connections created in seconds that took few seconds each to change the state from TIME_WAIT and release the ports for the next set of connections.

    Out of all the available solutions available over the internet that speak of disabling keep-alive, using with request.Session() et al, I found this answer to be working which makes use of 'Connection' : 'close' configuration as header parameter. You may need to put the header content in a separte line outside the post command though.

    headers = {
            'Connection': 'close'
    }
    with requests.Session() as session:
    response = session.post('https://xx.xxx.xxx.x/xxxxxx/x', headers=headers, files=files, verify=False)
    results = response.json()
    print results
    

    Just give it a try with request library.

    0 讨论(0)
  • 2021-01-07 14:02

    Here is an answer to a similar looking question that I prepared earlier.... much earlier... Socket in use error when reusing sockets

    The error is different, but the underlying problem is probably the same: you are consuming all available ports and trying to reuse them before the TIME_WAIT state has ended.

    [EDIT: in response to comments]

    If it is within the capability/spec for your application, one obvious strategy is to control the rate of connections to avoid this situation.

    Alternatively, you could use the httplib module. httplib.HTTPConnection() lets you specify a source_address tuple with which you can specify the port from which to make the connection, e.g. this will connect to localhost:1234 from localhost:9999:

    import httplib
    conn = httplib.HTTPConnection('localhost:1234', source_address=('localhost',9999))
    conn.request('GET', '/index.html')
    

    Then it is a matter of managing the source port assignment as described in my earlier answer. If you are on Windows you can use this method to get around the default range of ports 1024-5000.

    There is (of course), an upper limit to how many connections you are going to be able to make and it is questionable what sort of an application would require making thousands of connections in rapid succession.

    0 讨论(0)
  • 2021-01-07 14:07

    As mhawke suggested, the issue of TIME_WAIT seems most likely. The system wide fix for your situation can be to adjust kernel parameters so such connections are cleaned up more often. Two options:

    $ sysctl net.ipv4.tcp_tw_recycle=1
    

    This will let the kernel reuse connections in TIME_WAIT state. This may cause issues with NAT setups. Another one is:

    $ sysctl net.ipv4.tcp_max_orphans=8192
    $ sysctl net.ipv4.tcp_orphan_retries=1
    

    This tells the kernel to keep at most 8192 connections not attached to any user process and only retry once before killing TCP connections.

    Note that these are not permanent changes. Add the setting to /etc/sysctl.conf to make them permanent.

    http://code.google.com/p/lusca-cache/issues/detail?id=89#c4
    http://tldp.org/HOWTO/Adv-Routing-HOWTO/lartc.kernel.obscure.html

    0 讨论(0)
提交回复
热议问题