how to use sed, awk, or gawk to print only what is matched?

后端 未结 11 505
长情又很酷
长情又很酷 2021-01-30 05:59

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk.

But in my case, I have a regular expression that I want to run

相关标签:
11条回答
  • 2021-01-30 06:31

    I use perl to make this easier for myself. e.g.

    perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/'
    

    This runs Perl, the -n option instructs Perl to read in one line at a time from STDIN and execute the code. The -e option specifies the instruction to run.

    The instruction runs a regexp on the line read, and if it matches prints out the contents of the first set of bracks ($1).

    You can do this will multiple file names on the end also. e.g.

    perl -ne 'print $1 if /.*abc([0-9]+)xyz.*/' example1.txt example2.txt

    0 讨论(0)
  • 2021-01-30 06:32
    gawk '/.*abc([0-9]+)xyz.*/' file
    
    0 讨论(0)
  • 2021-01-30 06:35

    If you want to select lines then strip out the bits you don't want:

    egrep 'abc[0-9]+xyz' inputFile | sed -e 's/^.*abc//' -e 's/xyz.*$//'
    

    It basically selects the lines you want with egrep and then uses sed to strip off the bits before and after the number.

    You can see this in action here:

    pax> echo 'a
    b
    c
    abc12345xyz
    a
    b
    c' | egrep 'abc[0-9]+xyz' | sed -e 's/^.*abc//' -e 's/xyz.*$//'
    12345
    pax> 
    

    Update: obviously if you actual situation is more complex, the REs will need to me modified. For example if you always had a single number buried within zero or more non-numerics at the start and end:

    egrep '[^0-9]*[0-9]+[^0-9]*$' inputFile | sed -e 's/^[^0-9]*//' -e 's/[^0-9]*$//'
    
    0 讨论(0)
  • 2021-01-30 06:37

    For awk. I would use the following script:

    /.*abc([0-9]+)xyz.*/ {
                print $0;
                next;
                }
                {
                /* default, do nothing */
                }
    
    0 讨论(0)
  • 2021-01-30 06:38

    you can do it with the shell

    while read -r line
    do
        case "$line" in
            *abc*[0-9]*xyz* ) 
                t="${line##abc}"
                echo "num is ${t%%xyz}";;
        esac
    done <"file"
    
    0 讨论(0)
  • 2021-01-30 06:40

    The OP's case doesn't specify that there can be multiple matches on a single line, but for the Google traffic, I'll add an example for that too.

    Since the OP's need is to extract a group from a pattern, using grep -o will require 2 passes. But, I still find this the most intuitive way to get the job done.

    $ cat > example.txt <<TXT
    a
    b
    c
    abc12345xyz
    a
    abc23451xyz asdf abc34512xyz
    c
    TXT
    
    $ cat example.txt | grep -oE 'abc([0-9]+)xyz'
    abc12345xyz
    abc23451xyz
    abc34512xyz
    
    $ cat example.txt | grep -oE 'abc([0-9]+)xyz' | grep -oE '[0-9]+'
    12345
    23451
    34512
    

    Since processor time is basically free but human readability is priceless, I tend to refactor my code based on the question, "a year from now, what am I going to think this does?" In fact, for code that I intend to share publicly or with my team, I'll even open man grep to figure out what the long options are and substitute those. Like so: grep --only-matching --extended-regexp

    0 讨论(0)
提交回复
热议问题