I'm creating a Chrome Extension that converts a string of nucleotides of length nlen into the corresponding amino acids.
I've done something similar to this before in Python but as I'm still very new to JavaScript I'm struggling to translate that same logic from Python to JavaScript. The code I have so far is the below:
function translateInput(n_seq) {
// code to translate goes here
// length of input nucleotide sequence
var nlen = n_seq.length
// declare initially empty amino acids string
var aa_seq = ""
// iterate over each chunk of three characters/nucleotides
// to match it with the correct codon
for (var i = 0; i < nlen; i++) {
aa_seq.concat(codon)
}
// return final string of amino acids
return aa_seq
}
I know that I want to iterate over characters three at a time, match them to the correct amino acid, and then continuously concatenate that amino acid to the output string of amino acids (aa_seq), returning that string once the loop is complete.
I also tried creating a dictionary of the codon to amino acid relationships and was wondering if there was a way to use something like that as a tool to match the three character codons to their respective amino acids:
codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
EDIT: An example of an input string of nucleotides would be "AAGCATAGAAATCGAGGG", with the corresponding output string "KHRNRG". Hope this helps!
Opinion
The first thing I would personally recommend is to build a dictionary that goes from 3-char codon to amino. This will allow your program to take several chains of codon strings and convert them to amino strings without having to do expensive deep lookups every time. The dictionary will work something like this
codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc
From there, I implemented two utility functions: slide
and slideStr
. These aren't particularly important, so I'll just cover them with a couple examples of input and output.
slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]
slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]
slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']
slideStr (2,2) ('abcd')
// ['ab', 'cd']
With the reverse dictionary and generic utility functions at our disposal, writing codon2amino
is a breeze
// codon2amino :: String -> String
const codon2amino = str =>
slideStr(3,3)(str)
.map(c => codonDict[c])
.join('')
Runnable demo
To clarify, we build codonDict
based on aminoDict
once, and re-use it for every codon-to-amino computation.
// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
// codon dictionary derived from aminoDict
const codonDict =
Object.keys(aminoDict).reduce((dict, a) =>
Object.assign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})
// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
if (n > xs.length)
return []
else
return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}
// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
slide(n,m) (Array.from(str)) .map(s => s.join(''))
// codon2amino :: String -> String
const codon2amino = str =>
slideStr(3,3)(str)
.map(c => codonDict[c])
.join('')
console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG
Further explanation
can you clarify what some of these variables are supposed to represent? (n, m, xs, c, etc)
Our slide
function gives us a sliding window over an array. It expects two parameters for the window – n
the window size, and m
the step size – and one parameter that is the array of items to iterate thru – xs
, which can be read as x
's, or plural x
, as in a collection of x
items
slide
is purposefully generic in that it can work on any iterable xs
. That means it can work with an Array, a String, or anything else that implements Symbol.iterator
. That's also why we use a generic name like xs
because naming it something specific pigeonholes us into thinking it can only work with a specific type
Other things like the variable c
in .map(c => codonDict[c])
are not particularly important – I named it c
for codon, but we could've named it x
or foo
, it doesn't matter. The "trick" to understanding c
is to understand .map
.
[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]
So really all we're doing here is taking an array ([1 2 3 4 5]
) and making a new array where we call f
for each element in the original array
Now when we look at .map(c => codonDict[c])
we understand that all we're doing is looking up c
in codonDict
for each element
const codon2amino = str =>
slideStr(3,3)(str) // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
.map(c => codonDict[c]) // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
.join('') // 'KHRN...'
Also, are these 'const' items able to essentially replace my original
translateInput()
function?
If you're not familiar with ES6 (ES2015), some of the syntaxes used above might seem foreign to you.
// foo using traditional function syntax
function foo (x) { return x + 1 }
// foo as an arrow function
const foo = x => x + 1
So in short, yes, codon2amino
is the exact replacement for your translateInput
, just defined using a const
binding and an arrow function. I chose codon2amino
as a name because it better describes the operation of the function – translateInput
doesn't say which way it's translating (A to B, or B to A?), and "input" is sort of a senseless descriptor here because all functions can take input.
The reason you're seeing other const
declarations is because we're splitting up the work of your function into multiple functions. The reasons for this are mostly beyond the scope of this answer, but the brief explanation is that one specialized function that takes on the responsibility of several tasks is less useful to us than multiple generic functions that can be combined/re-used in sensible ways.
Sure, codon2amino
needs look at each 3-letter sequence in the input string, but that doesn't mean we have to write the string-splitting code inside of the codon2amino
function. We can write a generic string splitting function like we did with slideStr
which is useful to any function that wants to iterate thru string sequences and then have our codon2amino
function use it – if we encapsulated all of that string-splitting code inside of codon2amino
, the next time we needed to iterate thru string sequences, we'd have to duplicate that portion of the code.
All that said..
Is there any way I can do this while keeping my original for loop structure?
I really think you should spend some time stepping thru the code above to see how it works. There's a lot of valuable lessons to learn there if you haven't yet seen program concerns separated in this way.
Of course that's not the only way to solve your problem tho. We can use a primitive for
loop. For me it's more mental overhead to thinking about creating iterators i
and manually incrementing i++
or i += 3
, making sure to check i < str.length
, reassignment of the return value result += something
etc – add a couple more variables and your brain quickly turns to soup.
function makeCodonDict (aminoDict) {
let result = {}
for (let k of Object.keys(aminoDict))
for (let a of aminoDict[k])
result[a] = k
return result
}
function translateInput (dict, str) {
let result = ''
for (let i = 0; i < str.length; i += 3)
result += dict[str.substr(i,3)]
return result
}
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)
const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG
Also, you can write the above answer (@guest271314) in a compact form:
var res = ''
str.match(/.{1,3}/g).forEach(s => {
var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
res += key != undefined ? key : ''
})
you can see the completed answer in the following.
const codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
const str = "AAGCATAGAAATCGAGGG";
let res = "";
// just rewrite the above code into the short answer
str.match(/.{1,3}/g).forEach(s => {
var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
res += key != undefined ? key : ''
})
console.log(res);
Mh, I would recommend first of all changing the shape of your dictionary - that way is not very useful, so let's do this:
const dict = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
}
const codons = Object.keys(dict).reduce((a, b) => {dict[b].forEach(v => a[v] = b); return a}, {})
//In practice, you will get:
const codons = { GCA: 'A',
GCC: 'A',
GCG: 'A',
GCT: 'A',
TGC: 'C',
TGT: 'C',
GAC: 'D',
GAT: 'D',
GAA: 'E',
GAG: 'E',
TTC: 'F',
TTT: 'F',
GGA: 'G',
GGC: 'G',
GGG: 'G',
GGT: 'G',
CAC: 'H',
CAT: 'H',
ATA: 'I',
ATC: 'I',
ATT: 'I',
AAA: 'K',
AAG: 'K',
CTA: 'L',
CTC: 'L',
CTG: 'L',
CTT: 'L',
TTA: 'L',
TTG: 'L',
ATG: 'M',
AAC: 'N',
AAT: 'N',
CCA: 'P',
CCC: 'P',
CCG: 'P',
CCT: 'P',
CAA: 'Q',
CAG: 'Q',
AGA: 'R',
AGG: 'R',
CGA: 'R',
CGC: 'R',
CGG: 'R',
CGT: 'R',
AGC: 'S',
AGT: 'S',
TCA: 'S',
TCC: 'S',
TCG: 'S',
TCT: 'S',
ACA: 'T',
ACC: 'T',
ACG: 'T',
ACT: 'T',
GTA: 'V',
GTC: 'V',
GTG: 'V',
GTT: 'V',
TGG: 'W',
TAC: 'Y',
TAT: 'Y' }
//Now we are reasoning!
//From here on, it is pretty straightforward:
const rnaParser = s => s.match(/.{3}/g).map(fragment => codons[fragment]).join("")
You can use for
loop, String.prototype.slice()
to iterate string three characters at a time from beginning of string for..of
loop, Object.entries()
to iterate properties and values of codon_dictionary
object, Array.prototype.includes()
to match current three character portion of input string to an array set at value of codon_dictionary
object, concatenate property to string variable.
const codon_dictionary = {
"A": ["GCA","GCC","GCG","GCT"],
"C": ["TGC","TGT"],
"D": ["GAC", "GAT"],
"E": ["GAA","GAG"],
"F": ["TTC","TTT"],
"G": ["GGA","GGC","GGG","GGT"],
"H": ["CAC","CAT"],
"I": ["ATA","ATC","ATT"],
"K": ["AAA","AAG"],
"L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
"M": ["ATG"],
"N": ["AAC","AAT"],
"P": ["CCA","CCC","CCG","CCT"],
"Q": ["CAA","CAG"],
"R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
"S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
"T": ["ACA","ACC","ACG","ACT"],
"V": ["GTA","GTC","GTG","GTT"],
"W": ["TGG"],
"Y": ["TAC","TAT"],
};
const [entries, n] = [Object.entries(codon_dictionary), 3];
let [str, res] = ["AAGCATAGAAATCGAGGG", ""];
for (let i = 0; i + n <= str.length; i += n)
for (const [key, prop, curr = str.slice(i, i + n)] of entries)
if (prop.includes(curr)) {res += key; break;};
console.log(res);
来源:https://stackoverflow.com/questions/43725419/converting-nucleotides-to-amino-acids-using-javascript