Remove line breaks in a FASTA file

前端 未结 9 1210
予麋鹿
予麋鹿 2020-12-05 01:26

I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:

>accession1
ATGGCC         


        
9条回答
  •  有刺的猬
    2020-12-05 01:52

    There have been great responses so far.

    Here is an efficient way to do this in Python:

    def read_fasta(fasta):
        with open(fasta, 'r') as fast:
            headers, sequences = [], []
            for line in fast:
                if line.startswith('>'):
                    head = line.replace('>','').strip()
                    headers.append(head)
                    sequences.append('')
                else :
                    seq = line.strip()
                    if len(seq) > 0:
                        sequences[-1] += seq
        return (headers, sequences)
    
    
    def write_fasta(headers, sequences, fasta):
        with open(fasta, 'w') as fast:
            for i in range(len(headers)):
                fast.write('>' + headers[i] + '\n' + sequences[i] + '\n')
    

    You can use the above functions to retrieve sequences/headers from a fasta file without line breaks, manipulate them, and write back to a fasta file.

    headers, sequences = read_fasta('input.fasta')
    new_headers = do_something(headers)
    new_sequences = do_something(sequences)
    write_fasta(new_headers, new_sequences, 'input.fasta')
    

提交回复
热议问题