extract a string after a pattern

后端未结

关注

 4  985

I want to extract the numbers following client_id and id and pair up client_id and id in each line.

For example, for the following lines of log,

User(cli


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2021-02-09 12:12
              
            
            
                                                                       
Here's a awk script that works (I put it on multiple lines and made it a bit more verbose so you can see what's going on):

#!/bin/bash

awk 'BEGIN{FS="[\(\):,]"}
/client_id/ {
cid="no_client_id"
for (i=1; i<NF; i++) {
    if ($i == "client_id") {
        cid = $(i+1)
    } else if ($i == "id") {
        id = $(i+1);
        print cid OFS id;
    }
 }
}' input_file_name


Output:

03 204
03 491
03 29
04 209
04 301
05 20


Explanation:


awk 'BEGIN{FS="[\(\):,]"}: invoke awk, use ( ) : and , as delimiters to separate your fields
/client_id/ {: Only do the following for the lines that contain client_id:
for (i=1; i<NF; i++) {: iterate through the fields on each line one field at a time
if ($i == "client_id") { cid = $(i+1) }: if the field we are currently on is client_id, then its value is the next field in order.
else if ($i == "id") { id = $(i+1); print cid OFS id;}: otherwise if the field we are currently on is id, then print the client_id : id pair onto stdout
input_file_name: supply the name of your input file as first argument to the awk script.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  名媛妹妹        
                
              
                            
                2021-02-09 12:27
              
            
            
                                                                       
I would prefer awk for this, but if you were wondering how to do this with sed, here's one way that works with GNU sed.

parse.sed

/client_id/ {
  :a
  s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/
  ta
  s/^[^\n]+\n//
}


Run it like this:

sed -rf parse.sed infile


Or as a one-liner:

<infile sed '/client_id/ { :a; s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/; ta; s/^[^\n]+\n//; }'


Output:

03 204
03 491
03 29

04 209
04 301

05 20


Explanation:

The idea is to repeatedly match client_id:([0-9]+) and id:([0-9]+) pairs and put them at the end of pattern space. On each pass the id:([0-9]+) is removed.

The final replace removes left-overs from the loop.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-02-09 12:28
              
            
            
                                                                       
This might work for you (GNU sed):

sed -r '/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d;s//\2 \3\n\1/;P;D' file



/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d if the line doesn't have the intended strings delete it.
s//\2 \3\n\1/ re-arrange the line by copying the client_id and moving the first id ahead thus reducing the line for successive iterations.
P print upto the introduced newline.
D delete upto the introduced newline.

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  别跟我提以往        
                
              
                            
                2021-02-09 12:31
              
            
            
                                                                       
This may work for you:

awk -F "[):,]" '{ for (i=2; i<=NF; i++) if ($i ~ /id/) print $2, $(i+1) }' file


Results:

03 204
03 491
03 29
04 209
04 301
05 20

                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复