Bash: Parse CSV with quotes, commas and newlines

前端未结

关注

 7  1604

Say I have the following csv file:

 id,message,time
 123,\"Sorry, This message
 has commas and newlines\",2016-03-28T20:26:39
 456,\"It makes the problem non


                      
              相关标签:


      
      
        
          7条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  后悔当初        
                
              
                            
                2020-12-11 16:24
              
            
            
                                                                       
As chepner said, you are encouraged to use a programming language which is able to parse csv.

Here comes an example in python:

import csv

with open('a.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, quotechar='"')
    for row in reader:
        print(row[-1]) # row[-1] gives the last column

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  隐瞒了意图╮        
                
              
                            
                2020-12-11 16:25
              
            
            
                                                                       
awk -F, '!/This/{print $NF}' file

time
2016-03-28T20:26:39
2016-03-28T20:26:41

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  旧时难觅i        
                
              
                            
                2020-12-11 16:30
              
            
            
                                                                       
another awk alternative using FS

$ awk -F'"' '!(NF%2){getline remainder;$0=$0 OFS remainder}
                NR>1{sub(/,/,"",$NF); print $NF}' file

2016-03-28T20:26:39
2016-03-28T20:26:41

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情歌与酒        
                
              
                            
                2020-12-11 16:39
              
            
            
                                                                       
As said here

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file.csv \
 | awk -F, '{print $NF}'


To handle specifically those newlines that are in doubly-quoted strings and leave those alone that are outside them, using GNU awk (for RT):

gawk -v RS='"' 'NR % 2 == 0 { gsub(/\n/, "") } { printf("%s%s", $0, RT) }' file


This works by splitting the file along " characters and removing newlines in every other block.

Output    

time
2016-03-28T20:26:39
2016-03-28T20:26:41


Then use awk to split the columns and display the last column
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  无人共我        
                
              
                            
                2020-12-11 16:39
              
            
            
                                                                       
CSV is a format which needs a proper parser (i.e. which can't be parsed with regular expressions alone). If you have Python installed, use the csv module instead of plain BASH.

If not, consider csvkit which has a lot of powerful tools to process CSV files from the command line.

See also:


https://unix.stackexchange.com/questions/7425/is-there-a-robust-command-line-tool-for-processing-csv-files

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  礼貌的吻别        
                
              
                            
                2020-12-11 16:43
              
            
            
                                                                       
sed -e 's/,/\n/g' file.csv | egrep ^201[0-9]-

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
   
          
     1
2
下一页
           
           
        
                                  
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复