awk to print all columns from the nth to the last with spaces

后端未结

关注

 4  614

I have the following input file:

a 1  o p
b  2 o p p
c     3 o p p  p

in the last line there is a double space between the last p\'s


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  爱一瞬间的悲伤        
                
              
                            
                2021-01-23 22:06
              
            
            
                                                                       
GNU sed

remove first n fields

sed -r 's/([^ ]+ +){2}//' file


GNU awk 4.0+

awk '{sub("([^"FS"]"FS"){2}","")}1' file


GNU awk <4.0

awk --re-interval '{sub("([^"FS"]"FS"){2}","")}1' file


Incase FS one doesn't work(Eds suggestion)

awk '{sub(/([^ ] ){2}/,"")}1' file


Replace 2 with number of fields you wish to remove

EDIT

Another way(doesn't require re-interval)

awk '{for(i=0;i<2;i++)sub($1"[[:space:]]*","")}1' file


Further edit

As advised by EdMorton it is bad to use fields in sub as they may contain metacharacters so here is an alternative(again!)

awk '{for(i=0;i<2;i++)sub(/[^[:space:]]+[[:space:]]*/,"")}1' file


Output

o p
o p p
o p p  p

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  不知归路        
                
              
                            
                2021-01-23 22:08
              
            
            
                                                                       
Since you want to preserve spaces, let's just use cut:

$ cut -d' ' -f2- file
1 o p
2 o p p
3 o p p  p


Or for example to start by column 4:

$ cut -d' ' -f4- file
p
p p
p p  p


This will work as long as the columns you are removing are one-space separated.



If the columns you are removing also contain different amount of spaces, you can use the beautiful solution by Ed Morton in Print all but the first three columns:

awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){1}/,"")}1'
                                                   ^
                                        number of cols to remove


Test

$ cat a
a 1 o p
b    2 o p p
c  3 o p p  p
$ awk '{sub(/[[:space:]]*([^[:space:]]+[[:space:]]+){2}/,"")}1' a
o p
o p p
o p p  p

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-23 22:12
              
            
            
                                                                       
In Perl, you can use split with capturing to keep the delimiters:

perl -ne '@f = split /( +)/; print @f[ 1 * 2 .. $#f ]'
#                                      ^
#                                      |
#                              column number goes
#                              here (starting from 0)

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  梦如初夏        
                
              
                            
                2021-01-23 22:25
              
            
            
                                                                       
If you want to preserve all spaces after the start of the second column, this will do the trick:

{
    match($0, ($1 "[ \\t*]+"))
    print substr($0, RSTART+RLENGTH)
}


The call to match locates the start of the first 'token' on the line and the length of the first token and the whitespace that follows it.  Then you just print everything on the line after that.

You could generalize it somewhat to ignore the first N tokens this way:

BEGIN {
    N = 2
}

{
    r = ""
    for (i=1; i<=N; i++) {
        r = (r $i "[ \\t*]+")
    }
    match($0, r)
    print substr($0, RSTART+RLENGTH)
}


Applying the above script to your example input yields:

o p
o p p
o p p  p

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复