Huge string replace in JavaScript?

我的未来我决定 提交于 2019-12-07 15:02:24

问题


I've got a small JavaScript application that will parse files the user drops into the browser. Recently I've discovered an issue with some non-english characters. The file types that are dropped on here are using the Windows-1252 character set, so characters such as ñ, are actually coming through as ñ and I must convert them all to the proper characters.

For example, I get Señor which should be Señor in Spanish.

I've found an extremely useful website with the collection of the characters, and their counterparts that I need to convert to.

I've condensed that down into two JavaScript arrays:

var toReplace = ["À", "Ã", "Â", "Ã", "Ä", "Ã…", "Æ", "Ç", "È", "É", "Ê", "Ë", "ÃŒ", "Ã", "ÃŽ", "Ã", "Ã", "Ñ", "Ã’", "Ó", "Ô", "Õ", "Ö", "×", "Ø", "Ù", "Ú", "Û", "Ãœ", "Ã", "Þ", "ß", "Ã", "á", "â", "ã", "ä", "Ã¥", "æ", "ç", "è", "é", "ê", "ë", "ì", "í", "î", "ï", "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ"];
var replaceWith = ["À", "Á", "Â", "Ã", "Ä", "Å", "Æ", "Ç", "È", "É", "Ê", "Ë", "Ì", "Í", "Î", "Ï", "Ð", "Ñ", "Ò", "Ó", "Ô", "Õ", "Ö", "×", "Ø", "Ù", "Ú", "Û", "Ü", "Ý", "Þ", "ß", "à", "á", "â", "ã", "ä", "å", "æ", "ç", "è", "é", "ê", "ë", "ì", "í", "î", "ï", "ð", "ñ", "ò", "ó", "ô", "õ", "ö", "÷", "ø", "ù", "ú", "û", "ü", "ý", "þ", "ÿ"];

What would be the most efficient way to replace all characters from a paragraph in toReplace with it's counterpart (same index) in replaceWith?

I'm hoping this won't be too loop-heavy since it's not uncommon to drop over 100 files into this application that already does some heavy looping & parsing.

Perhaps there is a better way to do this instead of keeping these characters in arrays?

EDIT - I just realized I might need to replace with the unicode eqivilent instead. Here's an array of the unicode characters in the same order:

var unicodeReplaceWith= ["\u00C0", "\u00C1", "\u00C2", "\u00C3", "\u00C4", "\u00C5", "\u00C6", "\u00C7", "\u00C8", "\u00C9", "\u00CA", "\u00CB", "\u00CC", "\u00CD", "\u00CE", "\u00CF", "\u00D0", "\u00D1", "\u00D2", "\u00D3", "\u00D4", "\u00D5", "\u00D6", "\u00D7", "\u00D8", "\u00D9", "\u00DA", "\u00DB", "\u00DC", "\u00DD", "\u00DE", "\u00DF", "\u00E0", "\u00E1", "\u00E2", "\u00E3", "\u00E4", "\u00E5", "\u00E6", "\u00E7", "\u00E8", "\u00E9", "\u00EA", "\u00EB", "\u00EC", "\u00ED", "\u00EE", "\u00EF", "\u00F0", "\u00F1", "\u00F2", "\u00F3", "\u00F4", "\u00F5", "\u00F6", "\u00F7", "\u00F8", "\u00F9", "\u00FA", "\u00FB", "\u00FC", "\u00FD", "\u00FE", "\u00FF"];

回答1:


I don't know much about speed in JavaScript, or why this can't be configured correctly on the server, but here's one way to do it.

Interactive Demo

First we turn everything into an object, so we can look up translations.

var map = {};
for (var i=0; i<toReplace.length; i++) {
  map[toReplace[i]] = replaceWith[i];
}

Then we join our keys into a regular expression
(note: they must be sorted longest-first, code in the demo).

var expression = new RegExp(toReplace.join("|"), "g");

In the replace function, we can subsitute matches for results. This is as simple as looking them up in our map.

function doReplace(source) {
  return source.replace(expression, function(m) {
    return map[m];
  });
}

var result = doReplace("Señor");


来源:https://stackoverflow.com/questions/18222665/huge-string-replace-in-javascript

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!