Read file in chunks - RAM-usage, read Strings from binary files

后端未结

关注

 3  856

i\'d like to understand the difference in RAM-usage of this methods when reading a large file in python.

Version 1, found here on stackoverflow:

def


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2020-12-05 20:42
              
            
            
                                                                       
yield is the keyword in python used for generator expressions.  That means that the next time the function is called (or iterated on), the execution will start back up at the exact point it left off last time you called it.  The two functions behave identically; the only difference is that the first one uses a tiny bit more call stack space than the second.  However, the first one is far more reusable, so from a program design standpoint, the first one is actually better.

EDIT: Also, one other difference is that the first one will stop reading once all the data has been read, the way it should, but the second one will only stop once either f.read() or process_data() throws an exception.  In order to have the second one work properly, you need to modify it like so:

f = open(file, 'rb')
while True:
    piece = f.read(1024)  
    if not piece:
        break
    process_data(piece)
f.close()

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  有刺的猬        
                
              
                            
                2020-12-05 20:44
              
            
            
                                                                       
I think probably the best and most idiomatic way to do this would be to use the built-in iter() function with a sentinel value to create and use an iterable as shown below. Note that the last chunk might be less that the requested chunk size if the file size isn't an exact multiple of it.
from functools import partial

CHUNK_SIZE = 1024
filename = 'testfile.dat'

with open(filename, 'rb') as file:
    for chunk in iter(partial(file.read, CHUNK_SIZE), b''):
        process_data(chunk)

Update: Don't know when it was added, but almost exactly what's above is in now shown as an example in the official documentation of the iter() function.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  日久生厌        
                
              
                            
                2020-12-05 20:45
              
            
            
                                                                       
starting from python 3.8 you might also use an assignment expression (the walrus-operator):
with open('file.name', 'rb') as file:
    while chunk := file.read(1024)
        process_data(chunk)

the last chunk may be smaller than CHUNK_SIZE.
as read() will return b"" when the file has been read the while loop will terminate.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复