问题
Firstly I can't use BioPython :( I need to translate a bunch of FASTA sequences from a FASTA file and translate them to protein sequence. FASTA file is like this;
>some info
ACCGGGCTAAA
>other info
ACCGCCAATTT
So I can create a function that outputs only the DNA sequence but when I try to translate it I get the following error; "TypeError: object of type '_io.TextIOWrapper' has no len()" I have no ide how to resolve this. Any help is immensely appreciated!!!!! Also I am taking my first Python course so please explain any answers as if to a moron :)
#Open the file for reading
fasta=open('mRNA_database.fasta', 'r')
def readSeq(fasta):
for line in fasta:
if line.startswith('>'):
continue
line = line.strip()
#print(line)
readSeq(fasta)
g_code=dict()
g_code = {'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'stop', 'TAG':'stop',
'TGC':'C', 'TGT':'C', 'TGA':'stop', 'TGG':'W'}
def aa_to_prt(fasta, g_code):
prt = ''
for i in range(0, len(fasta), 3):
codon = fasta[i:i+3]
prt+= g_code[codon]
print(prt)
aa_to_prt(fasta, g_code)
回答1:
What is your desired output ?
for input like:
some info ACCGGGCTAAA
other info ACCGCCAATTT
with code:
def readSeq():
for line in open('mRNA_database.fasta', 'r'):
if line.startswith('>'):
continue
line = line.strip()
yield line.split(' ')[2]
g_code=dict()
g_code = {'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
'ACA':'T', 'ACC':'T', 'ACG':'T', 'ACT':'T',
'AAC':'N', 'AAT':'N', 'AAA':'K', 'AAG':'K',
'AGC':'S', 'AGT':'S', 'AGA':'R', 'AGG':'R',
'CTA':'L', 'CTC':'L', 'CTG':'L', 'CTT':'L',
'CCA':'P', 'CCC':'P', 'CCG':'P', 'CCT':'P',
'CAC':'H', 'CAT':'H', 'CAA':'Q', 'CAG':'Q',
'CGA':'R', 'CGC':'R', 'CGG':'R', 'CGT':'R',
'GTA':'V', 'GTC':'V', 'GTG':'V', 'GTT':'V',
'GCA':'A', 'GCC':'A', 'GCG':'A', 'GCT':'A',
'GAC':'D', 'GAT':'D', 'GAA':'E', 'GAG':'E',
'GGA':'G', 'GGC':'G', 'GGG':'G', 'GGT':'G',
'TCA':'S', 'TCC':'S', 'TCG':'S', 'TCT':'S',
'TTC':'F', 'TTT':'F', 'TTA':'L', 'TTG':'L',
'TAC':'Y', 'TAT':'Y', 'TAA':'stop', 'TAG':'stop',
'TGC':'C', 'TGT':'C', 'TGA':'stop', 'TGG':'W'}
def aa_to_prt(g_code):
prt = ''
for name in readSeq():
codon = name[:3]
prt += g_code[codon]
print(prt)
aa_to_prt(g_code)
i got output:
TT
is that what You want?
回答2:
You seem to be trying to use len(filehandle)
to figure out how far you can read into the file. But the handle doesn't have a length -- the file might, but that's not what you are looking at. And anyway, the API is a more general one, where in many cases the stream you are opening doesn't (yet) have a length -- there is no way for the system to know when opening a handle how many bytes the user will type, or how many packets are going to arrive over the network.
Instead, the convention is to simply iterate over the handle until it no longer produces a value. (Behind the scenes, modern Python uses an iterator which creates a StopIteration
exception when there is nothing left to read.)
Your readSeq
function does this correctly, but you are not returning any values from it, so it simply consumes the file, and leaves you with the file handle open at the end of the file, with nothing left to read.
Maybe try something like this instead.
def prtSeq(fastahandle):
global g_code # as defined in your code already
for line in fastahandle:
if line.startswith('>'):
continue
line = line.strip()
proteins = []
for seq in xrange(0, len(line)-1, 3):
proteins.append(g_code[line[seq:seq+3]])
print(''.join(proteins))
prtSeq(open('mRNA_database.fasta', 'r'))
As an aside, assigning an empty dictionary to the variable is useless; you are immediately overwriting the previous value with a new dictionary.
A better design would only return values to the caller for printing, but I take it you are primarily interested in getting the job done with the simplest possible code.
来源:https://stackoverflow.com/questions/36305314/how-to-translate-a-fasta-sequence-from-dict-how-to-make-function-output-a-strin