I want to find files that have \"abc\" AND \"efg\" in that order, and those two strings are on different lines in that file. Eg: a file with content:
blah bl
Sadly, you can't. From the grep
docs:
grep searches the named input FILEs (or standard input if no files are named, or if a single hyphen-minus (-) is given as file name) for lines containing a match to the given PATTERN.
Grep is not sufficient for this operation.
pcregrep which is found in most of the modern Linux systems can be used as
pcregrep -M 'abc.*(\n|.)*efg' test.txt
where -M
, --multiline
allow patterns to match more than one line
There is a newer pcre2grep also. Both are provided by the PCRE project.
pcre2grep is available for Mac OS X via Mac Ports as part of port pcre2
:
% sudo port install pcre2
and via Homebrew as:
% brew install pcre
or for pcre2
% brew install pcre2
pcre2grep is also available on Linux (Ubuntu 18.04+)
$ sudo apt install pcre2-utils # PCRE2
$ sudo apt install pcregrep # Older PCRE
With silver searcher:
ag 'abc.*(\n|.)*efg'
similar to ring bearer's answer, but with ag instead. Speed advantages of silver searcher could possibly shine here.
I used this to extract a fasta sequence from a multi fasta file using the -P option for grep:
grep -Pzo ">tig00000034[^>]+" file.fasta > desired_sequence.fasta
The core of the regexp is the [^>]
which translates to "not greater than symbol"
While the sed option is the simplest and easiest, LJ's one-liner is sadly not the most portable. Those stuck with a version of the C Shell will need to escape their bangs:
sed -e '/abc/,/efg/\!d' [file]
This unfortunately does not work in bash et al.
The filepattern *.sh
is important to prevent directories to be inspected. Of course some test could prevent that too.
for f in *.sh
do
a=$( grep -n -m1 abc $f )
test -n "${a}" && z=$( grep -n efg $f | tail -n 1) || continue
(( ((${z/:*/}-${a/:*/})) > 0 )) && echo $f
done
The
grep -n -m1 abc $f
searches maximum 1 matching and returns (-n) the linenumber. If a match was found (test -n ...) find the last match of efg (find all and take the last with tail -n 1).
z=$( grep -n efg $f | tail -n 1)
else continue.
Since the result is something like 18:foofile.sh String alf="abc";
we need to cut away from ":" till end of line.
((${z/:*/}-${a/:*/}))
Should return a positive result if the last match of the 2nd expression is past the first match of the first.
Then we report the filename echo $f
.