Efficient way of reading large txt file in python

后端未结

关注

 3  1951

I\'m trying to open a txt file with 4605227 rows (305 MB)

The way I have done this before is:

data = np.loadtxt(\'file.txt\', delimiter=\'\\t\', dtype=st


                      
              相关标签:


      
      
        
          3条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  清歌不尽        
                
              
                            
                2021-01-27 07:18
              
            
            
                                                                       
You read it directly in as a Pandas DataFrame. eg

import pandas as pd
pd.read_csv(path)


If you want to read faster, you can use modin:

import modin.pandas as pd
pd.read_csv(path)


https://github.com/modin-project/modin
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  庸人自扰        
                
              
                            
                2021-01-27 07:23
              
            
            
                                                                       
Rather than reading it in with numpy you could just read it directly in as a Pandas DataFrame. E.g., using the pandas.read_csv function, with something like:

df = pd.read_csv('file.txt', delimiter='\t', usecols=["a", "b", "c", "d", "e", "f", "g", "h", "i"])

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  感情败类        
                
              
                            
                2021-01-27 07:26
              
            
            
                                                                       
Method 1 :

You can read the file by chunks , Moreover there is a buffer size which ou can mention in readline and you can read. 

inputFile = open('inputTextFile','r')
buffer_line = inputFile.readlines(BUFFERSIZE)
while buffer_line:
    #logic goes here


Method 2:

You can also use nmap Module , Here below is the link whic will explain the usage.

import mmap

with open("hello.txt", "r+b") as f:
    # memory-map the file, size 0 means whole file
    mm = mmap.mmap(f.fileno(), 0)
    # read content via standard file methods
    print(mm.readline())  # prints b"Hello Python!\n"
    # read content via slice notation
    print(mm[:5])  # prints b"Hello"
    # update content using slice notation;
    # note that new content must have same size
    mm[6:] = b" world!\n"
    # ... and read again using standard file methods
    mm.seek(0)
    print(mm.readline())  # prints b"Hello  world!\n"
    # close the map
    mm.close()


https://docs.python.org/3/library/mmap.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复