Is it possible to use read_csv to read only specific lines?

前端未结

关注

 4  585

I have a csv file that looks like this:

TEST  
2012-05-01 00:00:00.203 ON 1  
2012-05-01 00:00:11.203 OFF 0  
2012-05-01 00:00:22.203 ON 1  
2012-05-01 00:00


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  心在旅途        
                
              
                            
                2021-01-02 14:33
              
            
            
                                                                       
When you get the row from the csv.reader, and when you can be sure that the first element is a string, then you can use

if not row[0].startswith('TEST'):
    process(row)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  逝去的感伤        
                
              
                            
                2021-01-02 14:34
              
            
            
                                                                       
Another option, since I just ran into this problem also:

import pandas as pd
import subprocess
grep = subprocess.check_output(['grep', '-n', '^TITLE', filename]).splitlines()
bad_lines = [int(s[:s.index(':')]) - 1 for s in grep]
df = pd.read_csv(filename, skiprows=bad_lines)


It's less portable than @eumiro's (read: probably doesn't work on Windows) and requires reading the file twice, but has the advantage that you don't have to store the entire file contents in memory.

You could of course do the same thing as the grep in Python, but it'd probably be slower.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧巷少年郎        
                
              
                            
                2021-01-02 14:41
              
            
            
                                                                       
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.io.parsers.read_csv.html?highlight=read_csv#pandas.io.parsers.read_csv


  skiprows : list-like or integer
  Row numbers to skip (0-indexed) or number of rows to skip (int)


Pass [0, 6] to skip rows with "TEST".
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  春和景丽        
                
              
                            
                2021-01-02 14:46
              
            
            
                                                                       
from cStringIO import StringIO
import pandas

s = StringIO()
with open('file.csv') as f:
    for line in f:
        if not line.startswith('TEST'):
            s.write(line)
s.seek(0) # "rewind" to the beginning of the StringIO object

pandas.read_csv(s) # with further parameters…

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复