How to download chunked data with Pythons urllib2

前端未结

关注

 3  510

I\'m trying to download a large file from a server with Python 2:

req = urllib2.Request(\"https://myserver/mylargefile.gz\")
rsp = urllib2.urlopen(req)
data


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2021-01-06 17:54
              
            
            
                                                                       
I have the same problem.

I found that "Transfer-Encoding: chunked" often appears with "Content-Encoding: 
gzip". 

So maybe we can get the compressed content and unzip it.

It works for me.

import urllib2
from StringIO import StringIO
import gzip

req = urllib2.Request(url)
req.add_header('Accept-encoding', 'gzip, deflate')
rsp = urllib2.urlopen(req)
if rsp.info().get('Content-Encoding') == 'gzip':
    buf = StringIO(rsp.read())
    f = gzip.GzipFile(fileobj=buf)
    data = f.read()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2021-01-06 18:00
              
            
            
                                                                       
From the python documentation on urllib2.urlopen:


  One caveat: the read() method, if the size argument is omitted or
  negative, may not read until the end of the data stream; there is no
  good way to determine that the entire stream from a socket has been
  read in the general case.


So, read the data in a loop:

req = urllib2.Request("https://myserver/mylargefile.gz")
rsp = urllib2.urlopen(req)
data = rsp.read(8192)
while data:
   # .. Do Something ..
   data = rsp.read(8192)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一生所求        
                
              
                            
                2021-01-06 18:02
              
            
            
                                                                       
If I'm not mistaken, the following worked for me - a while back:

data = ''
chunk = rsp.read()
while chunk:
    data += chunk
    chunk = rsp.read()


Each read reads one chunk - so keep on reading until nothing more's coming.
Don't have documenation ready supporting this...yet.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复