Remove diacritical marks (ń ǹ ň ñ ṅ ņ ṇ ṋ ṉ ̈ ɲ ƞ ᶇ ɳ ȵ) from Unicode chars

后端 未结 12 746
故里飘歌
故里飘歌 2020-11-22 11:42

I am looking at an algorithm that can map between characters with diacritics (tilde, circumflex, caret, umlaut, caron) and their \"simple\" character.

For example:

12条回答
  •  失恋的感觉
    2020-11-22 11:45

    For future reference, here is a C# extension method that removes accents.

    public static class StringExtensions
    {
        public static string RemoveDiacritics(this string str)
        {
            return new string(
                str.Normalize(NormalizationForm.FormD)
                    .Where(c => CharUnicodeInfo.GetUnicodeCategory(c) != 
                                UnicodeCategory.NonSpacingMark)
                    .ToArray());
        }
    }
    static void Main()
    {
        var input = "ŃŅŇ ÀÁÂÃÄÅ ŢŤţť Ĥĥ àáâãäå ńņň";
        var output = input.RemoveDiacritics();
        Debug.Assert(output == "NNN AAAAAA TTtt Hh aaaaaa nnn");
    }
    

提交回复
热议问题