Ignoring embeded spaces with AWK

后端未结

关注

 4  1366

I\'m looking for a simple way to print a specific field with awk while allowing for embedded spaces in the field.

Sample: Field1 Field2 \"Field Three\" Field


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  北恋        
                
              
                            
                2020-12-20 09:11
              
            
            
                                                                       
Parsing CSV can be a tricky business. I like to use a language with a proper CSV parsing module. For example with ruby, parsing the given line, using space as the column separator, and default double quotes quoting character:

ruby -rcsv -ne 'row = CSV.parse_line($_, {:col_sep=>" "}); puts row[2]' <<END
Field1 Field2 "Field Three" Field4
END


Field Three

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  野的像风        
                
              
                            
                2020-12-20 09:18
              
            
            
                                                                       
Based on this, in gawk maybe you can use something like

awk 'BEGIN{FPAT = "([^ ]+)|(\"[^\"]+\")"}{print $3}' input.txt


Output:

"Field Three"


It may need more work to get suited to your needs completely. 

I think it needs gawk 4+, https://lists.gnu.org/archive/html/info-gnu/2011-06/msg00013.html
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  生来不讨喜        
                
              
                            
                2020-12-20 09:26
              
            
            
                                                                       
Mark Setchell's answer is good, although it will not work if you don't know in advance how many embedded quotes you have (and it doesn't split on spaces anymore).

I hacked this together (obviously it can be improved):

gawk -v FIELD=2 '{ a=$ FIELD; if (substr(a, 0, 1) == "\"") { gsub(/^\"/, "", a); s=a; for (i = FIELD + 1; i <= NF; i++) { a=$ i; nbSub=gsub(/\"$/, "", a); s = s " " a; if (nbSub > 0) { break } } print(s) } }' <<<'allo "hello world" bar'


I would recommend using something else than gawk for this (maybe look into parsing the fields with your shell's IFS variable?).

Addendum: As I said above, this is not really the right tool for the job. For example, you can specify the first field with the -v FIELD=, but it counts fields based on AWK's separator (the embedded spaces are still counted).
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  迷失自我        
                
              
                            
                2020-12-20 09:30
              
            
            
                                                                       
You can do this if the double quotes are always there:

awk -F\" '{print $2}'


Specifically, I am telling awk that the fields are separated by double quotes, at which point the part you want is readily available as field 2.

If you need to get at subsequent fields, you can split the remainder of the line on spaces and get a new array, say F[] of fields, like this:

awk -F\" '{split($3,F," ");print $2,F[1],F[2]}' file

Field Three Field4 Field5


assuming your file looks like this:

Field1 Field2 "Field Three" Field4 Field5 Field6

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复