How do I remove diacritics (accents) from a string in .NET?

前端 未结 20 2861
南方客
南方客 2020-11-21 05:44

I\'m trying to convert some strings that are in French Canadian and basically, I\'d like to be able to take out the French accent marks in the letters while keeping the lett

相关标签:
20条回答
  • 2020-11-21 06:15

    What this person said:

    Encoding.ASCII.GetString(Encoding.GetEncoding(1251).GetBytes(text));

    It actually splits the likes of å which is one character (which is character code 00E5, not 0061 plus the modifier 030A which would look the same) into a plus some kind of modifier, and then the ASCII conversion removes the modifier, leaving the only a.

    0 讨论(0)
  • 2020-11-21 06:18

    The CodePage of Greek (ISO) can do it

    The information about this codepage is into System.Text.Encoding.GetEncodings(). Learn about in: https://msdn.microsoft.com/pt-br/library/system.text.encodinginfo.getencoding(v=vs.110).aspx

    Greek (ISO) has codepage 28597 and name iso-8859-7.

    Go to the code... \o/

    string text = "Você está numa situação lamentável";
    
    string textEncode = System.Web.HttpUtility.UrlEncode(text, Encoding.GetEncoding("iso-8859-7"));
    //result: "Voce+esta+numa+situacao+lamentavel"
    
    string textDecode = System.Web.HttpUtility.UrlDecode(textEncode);
    //result: "Voce esta numa situacao lamentavel"
    

    So, write this function...

    public string RemoveAcentuation(string text)
    {
        return
            System.Web.HttpUtility.UrlDecode(
                System.Web.HttpUtility.UrlEncode(
                    text, Encoding.GetEncoding("iso-8859-7")));
    }
    

    Note that... Encoding.GetEncoding("iso-8859-7") is equivalent to Encoding.GetEncoding(28597) because first is the name, and second the codepage of Encoding.

    0 讨论(0)
提交回复
热议问题