问题
I'm trying to read a FASTA file and then find specific motif(string) and print out the sequence and number of times it occurs. A FASTA file is just series of sequences(strings) that starts with a header line and the signature for header or start of a new sequence is ">". in a new line immediately after the header is the sequence of letters.I'm not done with code but so far I have this and it gives me this error:
AttributeError: 'str' object has no attribute 'next'
I'm not sure what's wrong here.
import re
header=""
counts=0
newline=""
f1=open('fpprotein_fasta(2).txt','r')
f2=open('motifs.xls','w')
for line in f1:
if line.startswith('>'):
header=line
#print header
nextline=line.next()
for i in nextline:
motif="ML[A-Z][A-Z][IV]R"
if re.findall(motif,nextline):
counts+=1
#print (header+'\t'+counts+'\t'+motif+'\n')
fout.write(header+'\t'+counts+'\t'+motif+'\n')
f1.close()
f2.close()
回答1:
The error is likely coming from the line:
nextline=line.next()
line
is the string you have already read, there is no next()
method on it.
Part of the problem is that you're trying to mix two different ways of reading the file - you are iterating over the lines using for line in f1
and <handle>.next()
.
Also, if you are working with FASTA files I recommend using Biopython: it makes working with collections of sequences much easier. In particular, Chapter 14 on motifs will be of particular interest to you. This will likely require that you learn more about Python in order to achieve what you want, but if you're going to be doing a lot more bioinformatics than what your example here shows then it's definitely worth the investment of time.
回答2:
This might help getting you in the right direction
import re
def parse(fasta, outfile):
motif = "ML[A-Z][A-Z][IV]R"
header = None
with open(fasta, 'r') as fin, open(outfile, 'w') as fout:
for line in fin:
if line.startswith('>'):
if header is not None:
fout.write(header + '\t' + str(count) + '\t' + motif + '\n')
header = line
count = 0
else:
matches = re.findall(motif, line)
count += len(matches)
if header is not None:
fout.write(header + '\t' + str(count) + '\t' + motif + '\n')
if __name__ == '__main__':
parse("fpprotein_fasta(2).txt", "motifs.xls")
回答3:
I am not sure about the pasta stuff, but I am pretty sure you did wrong here:
nextline=line.next()
line is simply a str
, so you can't call str.next()
Also, regarding files, you are recommended to use:
with open('fpprotein_fasta(2).txt','r') as f1:
This will deal with closing the file automatically.
You are encouraged to provide a sample fasta file so that I can try to correct the code.
来源:https://stackoverflow.com/questions/20580657/how-to-read-a-fasta-file-in-python