How do I remove diacritics (accents) from a string in .NET?

前端 未结 20 2946
南方客
南方客 2020-11-21 05:44

I\'m trying to convert some strings that are in French Canadian and basically, I\'d like to be able to take out the French accent marks in the letters while keeping the lett

20条回答
  •  Happy的楠姐
    2020-11-21 05:57

    I often use an extenstion method based on another version I found here (see Replacing characters in C# (ascii)) A quick explanation:

    • Normalizing to form D splits charactes like è to an e and a nonspacing `
    • From this, the nospacing characters are removed
    • The result is normalized back to form C (I'm not sure if this is neccesary)

    Code:

    using System.Linq;
    using System.Text;
    using System.Globalization;
    
    // namespace here
    public static class Utility
    {
        public static string RemoveDiacritics(this string str)
        {
            if (null == str) return null;
            var chars =
                from c in str.Normalize(NormalizationForm.FormD).ToCharArray()
                let uc = CharUnicodeInfo.GetUnicodeCategory(c)
                where uc != UnicodeCategory.NonSpacingMark
                select c;
    
            var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);
    
            return cleanStr;
        }
    
        // or, alternatively
        public static string RemoveDiacritics2(this string str)
        {
            if (null == str) return null;
            var chars = str
                .Normalize(NormalizationForm.FormD)
                .ToCharArray()
                .Where(c=> CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark)
                .ToArray();
    
            return new string(chars).Normalize(NormalizationForm.FormC);
        }
    }
    

提交回复
热议问题