I am a biology graduate student and I taught myself a very limited amount of python in the past few months to deal with some data I have. I am not asking for homework help, this
There is one more problem in your code - when you use stop = sequencestart.find('TAA')
you don't care about opened reading frame. In code below I split sequence into triplets and use itertools.takewhile
to handle that but it can be done using loops as well:
from itertools import takewhile
def translate_dna(sequence, codontable, stop_codons = ('TAA', 'TGA', 'TAG')):
start = sequence.find('ATG')
# Take sequence from the first start codon
trimmed_sequence = sequence[start:]
# Split it into triplets
codons = [trimmed_sequence[i:i+3] for i in range(0, len(trimmed_sequence), 3)]
print(len(codons))
print(trimmed_sequence)
print(codons)
# Take all codons until first stop codon
coding_sequence = takewhile(lambda x: x not in stop_codons and len(x) == 3 , codons)
# Translate and join into string
protein_sequence = ''.join([codontable[codon] for codon in coding_sequence])
# This line assumes there is always stop codon in the sequence
return "{0}_".format(protein_sequence)
Your problem stems from the line
if cds[n:n+3] in codontable == True
This always evaluates to False
, and thus you never append to proteinsequence
. Just remove the == True
portion like so
if cds[n:n+3] in codontable
and you will get the protein sequence. Also, make sure to return proteinsequence
in translate_dna()
.