how to use sed, awk, or gawk to print only what is matched?

后端 未结 11 507
长情又很酷
长情又很酷 2021-01-30 05:59

I see lots of examples and man pages on how to do things like search-and-replace using sed, awk, or gawk.

But in my case, I have a regular expression that I want to run

相关标签:
11条回答
  • 2021-01-30 06:46

    If your version of grep supports it you could use the -o option to print only the portion of any line that matches your regexp.

    If not then here's the best sed I could come up with:

    sed -e '/[0-9]/!d' -e 's/^[^0-9]*//' -e 's/[^0-9]*$//'
    

    ... which deletes/skips with no digits and, for the remaining lines, removes all leading and trailing non-digit characters. (I'm only guessing that your intention is to extract the number from each line that contains one).

    The problem with something like:

    sed -e 's/.*\([0-9]*\).*/&/' 
    

    .... or

    sed -e 's/.*\([0-9]*\).*/\1/'
    

    ... is that sed only supports "greedy" match ... so the first .* will match the rest of the line. Unless we can use a negated character class to achieve a non-greedy match ... or a version of sed with Perl-compatible or other extensions to its regexes, we can't extract a precise pattern match from with the pattern space (a line).

    0 讨论(0)
  • 2021-01-30 06:47

    My sed (Mac OS X) didn't work with +. I tried * instead and I added p tag for printing match:

    sed -n 's/^.*abc\([0-9]*\)xyz.*$/\1/p' example.txt
    

    For matching at least one numeric character without +, I would use:

    sed -n 's/^.*abc\([0-9][0-9]*\)xyz.*$/\1/p' example.txt
    
    0 讨论(0)
  • 2021-01-30 06:49

    perl is the cleanest syntax, but if you don't have perl (not always there, I understand), then the only way to use gawk and components of a regex is to use the gensub feature.

    gawk '/abc[0-9]+xyz/ { print gensub(/.*([0-9]+).*/,"\\1","g"); }' < file
    

    output of the sample input file will be

    12345
    

    Note: gensub replaces the entire regex (between the //), so you need to put the .* before and after the ([0-9]+) to get rid of text before and after the number in the substitution.

    0 讨论(0)
  • 2021-01-30 06:52

    You can use awk with match() to access the captured group:

    $ awk 'match($0, /abc([0-9]+)xyz/, matches) {print matches[1]}' file
    12345
    

    This tries to match the pattern abc[0-9]+xyz. If it does so, it stores its slices in the array matches, whose first item is the block [0-9]+. Since match() returns the character position, or index, of where that substring begins (1, if it starts at the beginning of string), it triggers the print action.


    With grep you can use a look-behind and look-ahead:

    $ grep -oP '(?<=abc)[0-9]+(?=xyz)' file
    12345
    
    $ grep -oP 'abc\K[0-9]+(?=xyz)' file
    12345
    

    This checks the pattern [0-9]+ when it occurs within abc and xyz and just prints the digits.

    0 讨论(0)
  • 2021-01-30 06:55

    You can use sed to do this

     sed -rn 's/.*abc([0-9]+)xyz.*/\1/gp'
    
    • -n don't print the resulting line
    • -r this makes it so you don't have the escape the capture group parens().
    • \1 the capture group match
    • /g global match
    • /p print the result

    I wrote a tool for myself that makes this easier

    rip 'abc(\d+)xyz' '$1'
    
    0 讨论(0)
提交回复
热议问题