Remove accents/diacritics in a string in JavaScript

前端 未结 29 2569
轻奢々
轻奢々 2020-11-21 13:29

How do I remove accentuated characters from a string? Especially in IE6, I had something like this:

accentsTidy = function(s){
    var r=s.toLowerCase();
           


        
29条回答
  •  眼角桃花
    2020-11-21 13:59

    With ES2015/ES6 String.prototype.normalize(),

    const str = "Crème Brulée"
    str.normalize("NFD").replace(/[\u0300-\u036f]/g, "")
    > "Creme Brulee"
    

    Two things are happening here:

    1. normalize()ing to NFD Unicode normal form decomposes combined graphemes into the combination of simple ones. The è of Crème ends up expressed as e + ̀.
    2. Using a regex character class to match the U+0300 → U+036F range, it is now trivial to globally get rid of the diacritics, which the Unicode standard conveniently groups as the Combining Diacritical Marks Unicode block.

    See comment for performance testing.

    Alternatively, if you just want sorting

    Intl.Collator has sufficient support ~95% right now, a polyfill is also available here but I haven't tested it.

    const c = new Intl.Collator();
    ["creme brulee", "crème brulée", "crame brulai", "crome brouillé",
    "creme brulay", "creme brulfé", "creme bruléa"].sort(c.compare)
    ["crame brulai", "creme brulay", "creme bruléa", "creme brulee",
    "crème brulée", "creme brulfé", "crome brouillé"]
    
    
    ["creme brulee", "crème brulée", "crame brulai", "crome brouillé"].sort((a,b) => a>b)
    ["crame brulai", "creme brulee", "crome brouillé", "crème brulée"]
    

提交回复
热议问题