Ignore character while importing with pandas

前端未结

关注

 4  1199

I could not find such an option in the documentation. A measuring device spits out everything in Excel:

<>


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2021-01-16 20:22
              
            
            
                                                                       
I have the same problem. My first line is:

# id ra dec ...


Where # is the commenting-character in Python. import_csv thinks that # is a column header, but it's not.
The workaround I used was to define the headers manually:

headerlist = ['id', 'ra', 'dec', ...]  
df = pd.read_csv('data.txt', index_col=False, header=0, names=headerlist)


Note that index_col is optional in regards to this problem.

If there is any option to ignore a certain character in header line, I haven't found it. Hope this solution can be improved upon.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一个人的身影        
                
              
                            
                2021-01-16 20:26
              
            
            
                                                                       
Another option would be:

f = open(fname, 'r')
line1 = f.readline()
data1 = pd.read_csv(f, sep='\s+', names=line1.replace(' #', '').split(), dtype=np.float)


You might have a different separator though.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  囚心锁ツ        
                
              
                            
                2021-01-16 20:34
              
            
            
                                                                       
I have the same problem.  My first line is

# id x y ...


So pandas header keyword doesn't work. I did the following by reading it twice:

cos_phot_header = pd.read_csv(table, delim_whitespace=True, header=None, engine='python', nrows=1)
cos_plot_text_header = cos_phot_header.drop(0, axis=1).values.tolist()
cos_phot_data = pd.read_csv(table, skip_blank_lines=True, comment='#', 
               delim_whitespace=True, header=None, engine='python', names=cos_plot_text_header[0])


I don't understand why there is no such option in pandas to do this, it is a very common problem that everyone encounters. You can also read the table with no lines (nrows=0) and use .columns, but honestly I think it is an equally ugly solution to the problem.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别那么骄傲        
                
              
                            
                2021-01-16 20:43
              
            
            
                                                                       
Pandas read_csv() supports regex. You can avoid matching the white space if it is preceded by something (in your case #). Just as an example, avoiding "!":

sep='(?<!\\!)\s+'


if you want you could rename the column to remove the initial character and white space.

cheers
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复