extract a string after a pattern

后端 未结 4 978
失恋的感觉
失恋的感觉 2021-02-09 11:33

I want to extract the numbers following client_id and id and pair up client_id and id in each line.

For example, for the following lines of log,

User(cli         


        
相关标签:
4条回答
  • 2021-02-09 12:12

    Here's a awk script that works (I put it on multiple lines and made it a bit more verbose so you can see what's going on):

    #!/bin/bash
    
    awk 'BEGIN{FS="[\(\):,]"}
    /client_id/ {
    cid="no_client_id"
    for (i=1; i<NF; i++) {
        if ($i == "client_id") {
            cid = $(i+1)
        } else if ($i == "id") {
            id = $(i+1);
            print cid OFS id;
        }
     }
    }' input_file_name
    

    Output:

    03 204
    03 491
    03 29
    04 209
    04 301
    05 20
    

    Explanation:

    • awk 'BEGIN{FS="[\(\):,]"}: invoke awk, use ( ) : and , as delimiters to separate your fields
    • /client_id/ {: Only do the following for the lines that contain client_id:
    • for (i=1; i<NF; i++) {: iterate through the fields on each line one field at a time
    • if ($i == "client_id") { cid = $(i+1) }: if the field we are currently on is client_id, then its value is the next field in order.
    • else if ($i == "id") { id = $(i+1); print cid OFS id;}: otherwise if the field we are currently on is id, then print the client_id : id pair onto stdout
    • input_file_name: supply the name of your input file as first argument to the awk script.
    0 讨论(0)
  • 2021-02-09 12:27

    I would prefer awk for this, but if you were wondering how to do this with sed, here's one way that works with GNU sed.

    parse.sed

    /client_id/ {
      :a
      s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/
      ta
      s/^[^\n]+\n//
    }
    

    Run it like this:

    sed -rf parse.sed infile
    

    Or as a one-liner:

    <infile sed '/client_id/ { :a; s/(client_id:([0-9]+))[^(]+\(id:([0-9]+)([^\n]+)(.*)/\1 \4\5\n\2 \3/; ta; s/^[^\n]+\n//; }'
    

    Output:

    03 204
    03 491
    03 29
    
    04 209
    04 301
    
    05 20
    

    Explanation:

    The idea is to repeatedly match client_id:([0-9]+) and id:([0-9]+) pairs and put them at the end of pattern space. On each pass the id:([0-9]+) is removed.

    The final replace removes left-overs from the loop.

    0 讨论(0)
  • 2021-02-09 12:28

    This might work for you (GNU sed):

    sed -r '/.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d;s//\2 \3\n\1/;P;D' file
    
    • /.*(\(client_id:([0-9]+))[^(]*\(id:([0-9]+)/!d if the line doesn't have the intended strings delete it.
    • s//\2 \3\n\1/ re-arrange the line by copying the client_id and moving the first id ahead thus reducing the line for successive iterations.
    • P print upto the introduced newline.
    • D delete upto the introduced newline.
    0 讨论(0)
  • 2021-02-09 12:31

    This may work for you:

    awk -F "[):,]" '{ for (i=2; i<=NF; i++) if ($i ~ /id/) print $2, $(i+1) }' file
    

    Results:

    03 204
    03 491
    03 29
    04 209
    04 301
    05 20
    
    0 讨论(0)
提交回复
热议问题