I have a fasta file where the sequences are broken up with newlines. I\'d like to remove the newlines. Here\'s an example of my file:
>accession1
ATGGCC
There have been great responses so far.
Here is an efficient way to do this in Python:
def read_fasta(fasta):
with open(fasta, 'r') as fast:
headers, sequences = [], []
for line in fast:
if line.startswith('>'):
head = line.replace('>','').strip()
headers.append(head)
sequences.append('')
else :
seq = line.strip()
if len(seq) > 0:
sequences[-1] += seq
return (headers, sequences)
def write_fasta(headers, sequences, fasta):
with open(fasta, 'w') as fast:
for i in range(len(headers)):
fast.write('>' + headers[i] + '\n' + sequences[i] + '\n')
You can use the above functions to retrieve sequences/headers from a fasta file without line breaks, manipulate them, and write back to a fasta file.
headers, sequences = read_fasta('input.fasta')
new_headers = do_something(headers)
new_sequences = do_something(sequences)
write_fasta(new_headers, new_sequences, 'input.fasta')