How to find patterns across multiple lines using grep?

后端 未结 26 1679
你的背包
你的背包 2020-11-22 04:14

I want to find files that have \"abc\" AND \"efg\" in that order, and those two strings are on different lines in that file. Eg: a file with content:

blah bl         


        
相关标签:
26条回答
  • 2020-11-22 04:21

    Sadly, you can't. From the grep docs:

    grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.

    0 讨论(0)
  • 2020-11-22 04:22

    Grep is not sufficient for this operation.

    pcregrep which is found in most of the modern Linux systems can be used as

    pcregrep -M  'abc.*(\n|.)*efg' test.txt
    

    where -M, --multiline allow patterns to match more than one line

    There is a newer pcre2grep also. Both are provided by the PCRE project.

    pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2:

    % sudo port install pcre2 
    

    and via Homebrew as:

    % brew install pcre
    

    or for pcre2

    % brew install pcre2
    

    pcre2grep is also available on Linux (Ubuntu 18.04+)

    $ sudo apt install pcre2-utils # PCRE2
    $ sudo apt install pcregrep    # Older PCRE
    
    0 讨论(0)
  • 2020-11-22 04:22

    With silver searcher:

    ag 'abc.*(\n|.)*efg'
    

    similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.

    0 讨论(0)
  • 2020-11-22 04:24

    I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:

    grep -Pzo ">tig00000034[^>]+"  file.fasta > desired_sequence.fasta
    
    • P for perl based searches
    • z for making a line end in 0 bytes rather than newline char
    • o to just capture what matched since grep returns the whole line (which in this case since you did -z is the whole file).

    The core of the regexp is the [^>] which translates to "not greater than symbol"

    0 讨论(0)
  • 2020-11-22 04:27

    While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell will need to escape their bangs:

    sed -e '/abc/,/efg/\!d' [file]
    

    This unfortunately does not work in bash et al.

    0 讨论(0)
  • 2020-11-22 04:27

    The filepattern *.sh is important to prevent directories to be inspected. Of course some test could prevent that too.

    for f in *.sh
    do
      a=$( grep -n -m1 abc $f )
      test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue 
      (( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
    done
    

    The

    grep -n -m1 abc $f 
    

    searches maximum 1 matching and returns (-n) the linenumber. If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).

    z=$( grep -n efg $f | tail -n 1)
    

    else continue.

    Since the result is something like 18:foofile.sh String alf="abc"; we need to cut away from ":" till end of line.

    ((${z/:*/}-${a/:*/}))
    

    Should return a positive result if the last match of the 2nd expression is past the first match of the first.

    Then we report the filename echo $f.

    0 讨论(0)
提交回复
热议问题