How to properly decode accented characters for display

你。 提交于 2019-12-12 12:23:38

问题


My raw input file text file contains a string:

Caf&eacute (Should be Café)

The text file is a UTF8 file.

The output lets say is to another text file, so its not necessarily for a web page.

What C# method(s) can i use to output the correct format, Café?

Apparently a common problem?


回答1:


Have you tried System.Web.HttpUtility.HtmlDecode("Café")? it returns 538M results




回答2:


This is HTML encoded text. You need to decode it:

string decoded = HttpUtility.HtmlDecode(text);

UPDATE: french symbol "é" has HTML code "é" so, you need to fix your input string.




回答3:


You should use SecurityElement.Escape when working with XML files.

HtmlEncode will encode a lot of extra entities that are not required. XML only requires that you escape >, <, &, ", and ', which SecurityElement.Escape does.

When reading the file back through an XML parser, this conversion is done for you by the parser, you shouldn't need to "decode" it.

EDIT: Of course this is only helpful when writing XML files.




回答4:


I think this works:

string utf8String = "Your string";

Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;

byte[] utf8Bytes = utf8.GetBytes(utf8String);

byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);

char[] uniChars = new char[unicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length)];
unicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, uniChars, 0);

string unicodeString = new string(uniChars);



回答5:


Use HttpUtility.HtmlDecode. Example:

class Program
{
    static void Main()
    {
        XDocument doc = new XDocument(new XElement("test", 
            HttpUtility.HtmlDecode("caf&eacute;")));

        Console.WriteLine(doc);
        Console.ReadKey();
    }
}


来源:https://stackoverflow.com/questions/9875940/how-to-properly-decode-accented-characters-for-display

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!