Efficiently replace all accented characters in a string?

后端未结

关注

 21  2594

别跟我提以往 2020-11-22 04:35

For a poor man\'s implementation of near-collation-correct sorting on the client side I need a JavaScript function that does efficient single character rep

21条回答

-上瘾入骨i (楼主)

2020-11-22 05:21
Long time ago I did this in Java and found someone else's solution based on a single string that captures part of the Unicode table that was important for the conversion - the rest was converted to ? or any other replacement character. So I tried to convert it to JavaScript. Mind that I'm no JS expert. :-)
```
TAB_00C0 = "AAAAAAACEEEEIIII" +
    "DNOOOOO*OUUUUYIs" +
    "aaaaaaaceeeeiiii" +
    "?nooooo/ouuuuy?y" +
    "AaAaAaCcCcCcCcDd" +
    "DdEeEeEeEeEeGgGg" +
    "GgGgHhHhIiIiIiIi" +
    "IiJjJjKkkLlLlLlL" +
    "lLlNnNnNnnNnOoOo" +
    "OoOoRrRrRrSsSsSs" +
    "SsTtTtTtUuUuUuUu" +
    "UuUuWwYyYZzZzZzF";

function stripDiacritics(source) {
    var result = source.split('');
    for (var i = 0; i < result.length; i++) {
        var c = source.charCodeAt(i);
        if (c >= 0x00c0 && c <= 0x017f) {
            result[i] = String.fromCharCode(TAB_00C0.charCodeAt(c - 0x00c0));
        } else if (c > 127) {
            result[i] = '?';
        }
    }
    return result.join('');
}

stripDiacritics("Šupa, čo? ľšťčžýæøåℌð")
```
This converts most of latin1+2 Unicode characters. It is not able to translate single char to multiple. I don't know its performance on JS, in Java this is by far the fastest of common solutions (6-50x), there is no map, there is no regex, nothing. It produces strict ASCII output, potentially with a loss of information, but the size of the output matches the input.

I tested the snippet with http://www.webtoolkitonline.com/javascript-tester.html and it produced Supa, co? lstczyaoa?? as expected.
0 讨论(0)

查看其它21个回答
发布评论:

提交评论
- 加载中...