Converting nucleotides to amino acids using JavaScript

旧巷老猫 提交于 2019-12-05 18:01:48

Opinion

The first thing I would personally recommend is to build a dictionary that goes from 3-char codon to amino. This will allow your program to take several chains of codon strings and convert them to amino strings without having to do expensive deep lookups every time. The dictionary will work something like this

codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc

From there, I implemented two utility functions: slide and slideStr. These aren't particularly important, so I'll just cover them with a couple examples of input and output.

slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]

slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]

slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']

slideStr (2,2) ('abcd')
// ['ab', 'cd']

With the reverse dictionary and generic utility functions at our disposal, writing codon2amino is a breeze

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

Runnable demo

To clarify, we build codonDict based on aminoDict once, and re-use it for every codon-to-amino computation.

// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };

// codon dictionary derived from aminoDict
const codonDict =
 Object.keys(aminoDict).reduce((dict, a) =>
   Object.assign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})

// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
  if (n > xs.length)
    return []
  else
    return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}

// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
  slide(n,m) (Array.from(str)) .map(s => s.join(''))

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG

Further explanation

can you clarify what some of these variables are supposed to represent? (n, m, xs, c, etc)

Our slide function gives us a sliding window over an array. It expects two parameters for the window – n the window size, and m the step size – and one parameter that is the array of items to iterate thru – xs, which can be read as x's, or plural x, as in a collection of x items

slide is purposefully generic in that it can work on any iterable xs. That means it can work with an Array, a String, or anything else that implements Symbol.iterator. That's also why we use a generic name like xs because naming it something specific pigeonholes us into thinking it can only work with a specific type

Other things like the variable c in .map(c => codonDict[c]) are not particularly important – I named it c for codon, but we could've named it x or foo, it doesn't matter. The "trick" to understanding c is to understand .map.

[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]

So really all we're doing here is taking an array ([1 2 3 4 5]) and making a new array where we call f for each element in the original array

Now when we look at .map(c => codonDict[c]) we understand that all we're doing is looking up c in codonDict for each element

const codon2amino = str =>
  slideStr(3,3)(str)          // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
    .map(c => codonDict[c])   // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
    .join('')                 // 'KHRN...'

Also, are these 'const' items able to essentially replace my original translateInput() function?

If you're not familiar with ES6 (ES2015), some of the syntaxes used above might seem foreign to you.

// foo using traditional function syntax
function foo (x) { return x + 1 }

// foo as an arrow function
const foo = x => x + 1

So in short, yes, codon2amino is the exact replacement for your translateInput, just defined using a const binding and an arrow function. I chose codon2amino as a name because it better describes the operation of the function – translateInput doesn't say which way it's translating (A to B, or B to A?), and "input" is sort of a senseless descriptor here because all functions can take input.

The reason you're seeing other const declarations is because we're splitting up the work of your function into multiple functions. The reasons for this are mostly beyond the scope of this answer, but the brief explanation is that one specialized function that takes on the responsibility of several tasks is less useful to us than multiple generic functions that can be combined/re-used in sensible ways.

Sure, codon2amino needs look at each 3-letter sequence in the input string, but that doesn't mean we have to write the string-splitting code inside of the codon2amino function. We can write a generic string splitting function like we did with slideStr which is useful to any function that wants to iterate thru string sequences and then have our codon2amino function use it – if we encapsulated all of that string-splitting code inside of codon2amino, the next time we needed to iterate thru string sequences, we'd have to duplicate that portion of the code.


All that said..

Is there any way I can do this while keeping my original for loop structure?

I really think you should spend some time stepping thru the code above to see how it works. There's a lot of valuable lessons to learn there if you haven't yet seen program concerns separated in this way.

Of course that's not the only way to solve your problem tho. We can use a primitive for loop. For me it's more mental overhead to thinking about creating iterators i and manually incrementing i++ or i += 3, making sure to check i < str.length, reassignment of the return value result += something etc – add a couple more variables and your brain quickly turns to soup.

function makeCodonDict (aminoDict) {
  let result = {}
  for (let k of Object.keys(aminoDict))
    for (let a of aminoDict[k])
      result[a] = k
  return result
}

function translateInput (dict, str) {
  let result = ''
  for (let i = 0; i < str.length; i += 3)
    result += dict[str.substr(i,3)]
  return result
}

const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)

const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG

Also, you can write the above answer (@guest271314) in a compact form:

var res = ''
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

you can see the completed answer in the following.

const codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

const str = "AAGCATAGAAATCGAGGG";

let res = "";
// just rewrite the above code into the short answer
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

console.log(res);

Mh, I would recommend first of all changing the shape of your dictionary - that way is not very useful, so let's do this:

const dict = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
}
const codons = Object.keys(dict).reduce((a, b) => {dict[b].forEach(v => a[v] = b); return a}, {})

//In practice, you will get:

const codons = { GCA: 'A',
  GCC: 'A',
  GCG: 'A',
  GCT: 'A',
  TGC: 'C',
  TGT: 'C',
  GAC: 'D',
  GAT: 'D',
  GAA: 'E',
  GAG: 'E',
  TTC: 'F',
  TTT: 'F',
  GGA: 'G',
  GGC: 'G',
  GGG: 'G',
  GGT: 'G',
  CAC: 'H',
  CAT: 'H',
  ATA: 'I',
  ATC: 'I',
  ATT: 'I',
  AAA: 'K',
  AAG: 'K',
  CTA: 'L',
  CTC: 'L',
  CTG: 'L',
  CTT: 'L',
  TTA: 'L',
  TTG: 'L',
  ATG: 'M',
  AAC: 'N',
  AAT: 'N',
  CCA: 'P',
  CCC: 'P',
  CCG: 'P',
  CCT: 'P',
  CAA: 'Q',
  CAG: 'Q',
  AGA: 'R',
  AGG: 'R',
  CGA: 'R',
  CGC: 'R',
  CGG: 'R',
  CGT: 'R',
  AGC: 'S',
  AGT: 'S',
  TCA: 'S',
  TCC: 'S',
  TCG: 'S',
  TCT: 'S',
  ACA: 'T',
  ACC: 'T',
  ACG: 'T',
  ACT: 'T',
  GTA: 'V',
  GTC: 'V',
  GTG: 'V',
  GTT: 'V',
  TGG: 'W',
  TAC: 'Y',
  TAT: 'Y' }

//Now we are reasoning!

//From here on, it is pretty straightforward:

const rnaParser = s => s.match(/.{3}/g).map(fragment => codons[fragment]).join("")

You can use for loop, String.prototype.slice() to iterate string three characters at a time from beginning of string for..of loop, Object.entries() to iterate properties and values of codon_dictionary object, Array.prototype.includes() to match current three character portion of input string to an array set at value of codon_dictionary object, concatenate property to string variable.

const codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

const [entries, n] = [Object.entries(codon_dictionary), 3];

let [str, res] = ["AAGCATAGAAATCGAGGG", ""];

for (let i = 0; i + n <= str.length; i += n)
  for (const [key, prop, curr = str.slice(i, i + n)] of entries) 
    if (prop.includes(curr)) {res +=  key; break;};

console.log(res);
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!