How to print matched regex pattern using awk?

后端 未结 8 639
说谎
说谎 2020-11-29 15:56

Using awk, I need to find a word in a file that matches a regex pattern.

I only want to print the word matched with the pattern.

So if

相关标签:
8条回答
  • 2020-11-29 16:31

    gawk can get the matching part of every line using this as action:

    { if (match($0,/your regexp/,m)) print m[0] }
    

    match(string, regexp [, array]) If array is present, it is cleared, and then the zeroth element of array is set to the entire portion of string matched by regexp. If regexp contains parentheses, the integer-indexed elements of array are set to contain the portion of string matching the corresponding parenthesized subexpression. http://www.gnu.org/software/gawk/manual/gawk.html#String-Functions

    0 讨论(0)
  • 2020-11-29 16:42

    If Perl is an option, you can try this:

    perl -lne 'print $1 if /(regex)/' file
    

    To implement case-insensitive matching, add the i modifier

    perl -lne 'print $1 if /(regex)/i' file
    

    To print everything AFTER the match:

    perl -lne 'if ($found){print} else{if (/regex(.*)/){print $1; $found++}}' textfile
    

    To print the match and everything after the match:

    perl -lne 'if ($found){print} else{if (/(regex.*)/){print $1; $found++}}' textfile
    
    0 讨论(0)
  • 2020-11-29 16:43

    If you are only interested in the last line of input and you expect to find only one match (for example a part of the summary line of a shell command), you can also try this very compact code, adopted from How to print regexp matches using `awk`?:

    $ echo "xxx yyy zzz" | awk '{match($0,"yyy",a)}END{print a[0]}'
    yyy
    

    Or the more complex version with a partial result:

    $ echo "xxx=a yyy=b zzz=c" | awk '{match($0,"yyy=([^ ]+)",a)}END{print a[1]}'
    b
    

    Warning: the awk match() function with three arguments only exists in gawk, not in mawk

    Here is another nice solution using a lookbehind regex in grep instead of awk. This solution has lower requirements to your installation:

    $ echo "xxx=a yyy=b zzz=c" | grep -Po '(?<=yyy=)[^ ]+'
    b
    
    0 讨论(0)
  • 2020-11-29 16:44

    It sounds like you are trying to emulate GNU's grep -o behaviour. This will do that providing you only want the first match on each line:

    awk 'match($0, /regex/) {
        print substr($0, RSTART, RLENGTH)
    }
    ' file
    

    Here's an example, using GNU's awk implementation (gawk):

    awk 'match($0, /a.t/) {
        print substr($0, RSTART, RLENGTH)
    }
    ' /usr/share/dict/words | head
    act
    act
    act
    act
    aft
    ant
    apt
    art
    art
    art
    

    Read about match, substr, RSTART and RLENGTH in the awk manual.

    After that you may wish to extend this to deal with multiple matches on the same line.

    0 讨论(0)
  • 2020-11-29 16:46

    This is the very basic

    awk '/pattern/{ print $0 }' file
    

    ask awk to search for pattern using //, then print out the line, which by default is called a record, denoted by $0. At least read up the documentation.

    If you only want to get print out the matched word.

    awk '{for(i=1;i<=NF;i++){ if($i=="yyy"){print $i} } }' file
    
    0 讨论(0)
  • 2020-11-29 16:47

    Using sed can also be elegant in this situation. Example (replace line with matched group "yyy" from line):

    $ cat testfile
    xxx yyy zzz
    yyy xxx zzz
    $ cat testfile | sed -r 's#^.*(yyy).*$#\1#g'
    yyy
    yyy
    

    Relevant manual page: https://www.gnu.org/software/sed/manual/sed.html#Back_002dreferences-and-Subexpressions

    0 讨论(0)
提交回复
热议问题