Efficiently replace all accented characters in a string?

后端 未结 21 2594
别跟我提以往
别跟我提以往 2020-11-22 04:35

For a poor man\'s implementation of near-collation-correct sorting on the client side I need a JavaScript function that does efficient single character rep

21条回答
  •  -上瘾入骨i
    2020-11-22 05:21

    Long time ago I did this in Java and found someone else's solution based on a single string that captures part of the Unicode table that was important for the conversion - the rest was converted to ? or any other replacement character. So I tried to convert it to JavaScript. Mind that I'm no JS expert. :-)

    TAB_00C0 = "AAAAAAACEEEEIIII" +
        "DNOOOOO*OUUUUYIs" +
        "aaaaaaaceeeeiiii" +
        "?nooooo/ouuuuy?y" +
        "AaAaAaCcCcCcCcDd" +
        "DdEeEeEeEeEeGgGg" +
        "GgGgHhHhIiIiIiIi" +
        "IiJjJjKkkLlLlLlL" +
        "lLlNnNnNnnNnOoOo" +
        "OoOoRrRrRrSsSsSs" +
        "SsTtTtTtUuUuUuUu" +
        "UuUuWwYyYZzZzZzF";
    
    function stripDiacritics(source) {
        var result = source.split('');
        for (var i = 0; i < result.length; i++) {
            var c = source.charCodeAt(i);
            if (c >= 0x00c0 && c <= 0x017f) {
                result[i] = String.fromCharCode(TAB_00C0.charCodeAt(c - 0x00c0));
            } else if (c > 127) {
                result[i] = '?';
            }
        }
        return result.join('');
    }
    
    stripDiacritics("Šupa, čo? ľšťčžýæøåℌð")
    

    This converts most of latin1+2 Unicode characters. It is not able to translate single char to multiple. I don't know its performance on JS, in Java this is by far the fastest of common solutions (6-50x), there is no map, there is no regex, nothing. It produces strict ASCII output, potentially with a loss of information, but the size of the output matches the input.

    I tested the snippet with http://www.webtoolkitonline.com/javascript-tester.html and it produced Supa, co? lstczyaoa?? as expected.

提交回复
热议问题