I often need to find a particular sequence in a fasta file and print it. For those who don\'t know, fasta is a text file format for biological sequences (DNA, proteins, etc.
$ perl -0076 -lane 'print join("\n",@F) if $F[0]=~/sequence2/' file
Using sed
only:
sed -n '/>sequence3/,/>/ p' | sed '${/>/d}'
Like this maybe:
awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file
So, if the line starts with >sequence1
, set a flag (p
) to start printing, print this line and move to next. On subsequent lines, if the line starts with >
, change p
flag to stop printing. In general, print if the flag p
is set.
Or, improving a little on your grep
solution, use this to cut off the -A (after)
context:
grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'
So, that prints up to 999999 lines after sequence1
and pipes them into awk
. Awk then looks for a >
at the start of any line after line 1, and exits if it finds one. Until then, the 1
causes awk
to do its standard thing, which is print the current line.
Using the >
as the record separator:
awk -v seq="sequence2" -v RS='>' '$1 == seq {print RS $0}' file
>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG