Why is reading one byte 20x slower than reading 2, 3, 4, … bytes from a file?

前端未结

关注

 3  1238

I have been trying to understand the tradeoff between read and seek. For small \"jumps\" reading unneeded data is faster than skipping it with se


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  执念已碎        
                
              
                            
                2021-02-05 02:22
              
            
            
                                                                       
I have seen similar situations while dealing with arduinos interfacing with EEPROM. Basically, in order to write or read from a chip or data structure, you have to send a write/read enable command, send a starting location, and then grab the first character. If you grab multiple bytes, however, most chips will auto-increment their target address registers. Thus, there is some overhead for starting a read/write operation. It's the difference between:


Start communications
Send read enable
Send read command
Send address 1
Grab data from target 1
End communications
Start communications
Send read enable
Send read command
Send address 2
Grab data from target 2
End communications


and 


Start communications
Send read enable
Send read command
Send address 1
Grab data from target 1
Grab data from target 2
End communications


Just, in terms of machine instructions, reading multiple bits/bytes at a time clears a lot of overhead. It's even worse when some chips require you to idle for a few clock cycles after the read/write enable is send to let a mechanical process physically move a transistor into place to enable the reading or writing.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  小鲜肉        
                
              
                            
                2021-02-05 02:37
              
            
            
                                                                       
Reading from a file handle byte-for-byte will be generally slower than reading chunked.

In general, every read() call corresponds to a C read() call in Python. The total result involves a system call requesting the next char. For a file of 2 kb, this means 2000 calls to the kernel; each requiring a function call, request to the kernel, then awaiting response, passing that through the return.

Most notable here is awaiting response, the system call will block until your call is acknowledged in a queue, so you have to wait.

Fewer calls the better, so more bytes is faster; which is why buffered io is in fairly common use.

In python, buffering can be provided by io.BufferedReader or through the buffering keyword argument on open for files
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-02-05 02:40
              
            
            
                                                                       
I was able to reproduce the issue with your code. However, I noticed the following: can you verify that the issue disappears if you replace

file.seek(randint(0, file.raw._blksize), 1)


with

file.seek(randint(0, file.raw._blksize), 0)


in setup? I think you might just run out of data at some point during reading 1 byte. Reading 2 bytes, 3 bytes and so on won't have any data to read, so that's why it's much faster.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复