How to handle IncompleteRead: in python

后端 未结 8 1633
面向向阳花
面向向阳花 2020-12-05 10:12

I am trying to fetch some data from a website. However it returns me incomplete read. The data I am trying to get is a huge set of nested links. I did some rese

相关标签:
8条回答
  • 2020-12-05 10:23

    The link you included in your question is simply a wrapper that executes urllib's read() function, which catches any incomplete read exceptions for you. If you don't want to implement this entire patch, you could always just throw in a try/catch loop where you read your links. For example:

    try:
        page = urllib2.urlopen(urls).read()
    except httplib.IncompleteRead, e:
        page = e.partial
    

    for python3

    try:
        page = request.urlopen(urls).read()
    except (http.client.IncompleteRead) as e:
        page = e.partial
    
    0 讨论(0)
  • 2020-12-05 10:27

    What worked for me is catching IncompleteRead as an exception and harvesting the data you managed to read in each iteration by putting this into a loop like below: (Note, I am using Python 3.4.1 and the urllib library has changed between 2.7 and 3.4)

    try:
        requestObj = urllib.request.urlopen(url, data)
        responseJSON=""
        while True:
            try:
                responseJSONpart = requestObj.read()
            except http.client.IncompleteRead as icread:
                responseJSON = responseJSON + icread.partial.decode('utf-8')
                continue
            else:
                responseJSON = responseJSON + responseJSONpart.decode('utf-8')
                break
    
        return json.loads(responseJSON)
    
    except Exception as RESTex:
        print("Exception occurred making REST call: " + RESTex.__str__())
    
    0 讨论(0)
  • 2020-12-05 10:41

    I find out in my case : send HTTP/1.0 request , adding this , fix the problem.

    import httplib
    httplib.HTTPConnection._http_vsn = 10
    httplib.HTTPConnection._http_vsn_str = 'HTTP/1.0'
    

    after I do the request :

    req = urllib2.Request(url, post, headers)
    filedescriptor = urllib2.urlopen(req)
    img = filedescriptor.read()
    

    after I back to http 1.1 with (for connections that support 1.1) :

    httplib.HTTPConnection._http_vsn = 11
    httplib.HTTPConnection._http_vsn_str = 'HTTP/1.1'
    

    the trick is use http 1.0 instead the default http/1.1 http 1.1 could handle chunks but for some reason webserver don't , so we do the request in http 1.0

    for Python3, it will tell you

    ModuleNotFoundError: No module named 'httplib'

    then try to use http.client Module it will solve the problem

    import http.client as http
    http.HTTPConnection._http_vsn = 10
    http.HTTPConnection._http_vsn_str = 'HTTP/1.0'
    
    
    
    0 讨论(0)
  • 2020-12-05 10:43

    You can use requests instead of urllib2. requests is based on urllib3 so it rarely have any problem. Put it in a loop to try it 3 times, and it will be much stronger. You can use it this way:

    import requests      
    
    msg = None   
    for i in [1,2,3]:        
        try:  
            r = requests.get(self.crawling, timeout=30)
            msg = r.text
            if msg: break
        except Exception as e:
            sys.stderr.write('Got error when requesting URL "' + self.crawling + '": ' + str(e) + '\n')
            if i == 3 :
                sys.stderr.write('{0.filename}@{0.lineno}: Failed requesting from URL "{1}" ==> {2}\n'.                       format(inspect.getframeinfo(inspect.currentframe()), self.crawling, e))
                raise e
            time.sleep(10*(i-1))
    
    0 讨论(0)
  • 2020-12-05 10:45

    I tried all these solutions and none of them worked for me. Actually, what did work is instead of using urllib, I just used http.client (Python 3)

    conn = http.client.HTTPConnection('www.google.com')
    conn.request('GET', '/')
    r1 = conn.getresponse()
    page = r1.read().decode('utf-8')
    

    This works perfectly every time, whereas with urllib it was returning an incompleteread exception every time.

    0 讨论(0)
  • 2020-12-05 10:47

    I just add a more exception to pass this problem.
    just like

    try:
        r = requests.get(url, timeout=timeout)
    
    except (requests.exceptions.ChunkedEncodingError, requests.ConnectionError) as e:
        logging.error("There is a error: %s" % e)
    
    0 讨论(0)
提交回复
热议问题