Printing a sequence from a fasta file

后端 未结 4 1430
太阳男子
太阳男子 2021-01-13 20:45

I often need to find a particular sequence in a fasta file and print it. For those who don\'t know, fasta is a text file format for biological sequences (DNA, proteins, etc.

相关标签:
4条回答
  • 2021-01-13 21:24
    $ perl -0076 -lane 'print join("\n",@F) if $F[0]=~/sequence2/' file
    
    0 讨论(0)
  • 2021-01-13 21:25

    Using sed only:

    sed -n '/>sequence3/,/>/ p' | sed '${/>/d}'
    
    0 讨论(0)
  • 2021-01-13 21:33

    Like this maybe:

    awk '/>sequence1/{p++;print;next} /^>/{p=0} p' file
    

    So, if the line starts with >sequence1, set a flag (p) to start printing, print this line and move to next. On subsequent lines, if the line starts with >, change p flag to stop printing. In general, print if the flag p is set.

    Or, improving a little on your grep solution, use this to cut off the -A (after) context:

    grep -A 999999 "sequence1" file | awk 'NR>1 && /^>/{exit} 1'
    

    So, that prints up to 999999 lines after sequence1 and pipes them into awk. Awk then looks for a > at the start of any line after line 1, and exits if it finds one. Until then, the 1 causes awk to do its standard thing, which is print the current line.

    0 讨论(0)
  • 2021-01-13 21:38

    Using the > as the record separator:

    awk -v seq="sequence2" -v RS='>' '$1 == seq {print RS $0}' file
    
    >sequence2
    ACTGACTGACTGACTG
    ACTGACTGACTGACTG
    
    0 讨论(0)
提交回复
热议问题