Printing a sequence from a fasta file

后端未结

关注

 4  1435

I often need to find a particular sequence in a fasta file and print it. For those who don\'t know, fasta is a text file format for biological sequences (DNA, proteins, etc.

相关标签:

4条回答

灰色年华

2021-01-13 21:24
```
$ perl -0076 -lane 'print join("\n",@F) if $F[0]=~/sequence2/' file
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
隐瞒了意图╮

2021-01-13 21:25
Using sed only:
```
sed -n '/>sequence3/,/>/ p' | sed '${/>/d}'
```
0 讨论(0)
发布评论:

提交评论
- 加载中...
难免孤独

2021-01-13 21:33
Like this maybe:
```
awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file
```
So, if the line starts with >sequence1, set a flag (p) to start printing, print this line and move to next. On subsequent lines, if the line starts with >, change p flag to stop printing. In general, print if the flag p is set.

Or, improving a little on your grep solution, use this to cut off the -A (after) context:
```
grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'
```
So, that prints up to 999999 lines after sequence1 and pipes them into awk. Awk then looks for a > at the start of any line after line 1, and exits if it finds one. Until then, the 1 causes awk to do its standard thing, which is print the current line.
0 讨论(0)
发布评论:

提交评论
- 加载中...

深忆病人

2021-01-13 21:38

Using the > as the record separator:

awk -v seq="sequence2" -v RS='>' '$1 == seq {print RS $0}' file

>sequence2
ACTGACTGACTGACTG
ACTGACTGACTGACTG

0 讨论(0)