问题
My raw input file text file contains a string:
Café (Should be Café)
The text file is a UTF8 file.
The output lets say is to another text file, so its not necessarily for a web page.
What C# method(s) can i use to output the correct format, Café
?
Apparently a common problem?
回答1:
Have you tried System.Web.HttpUtility.HtmlDecode("Café")
? it returns 538M results
回答2:
This is HTML encoded text. You need to decode it:
string decoded = HttpUtility.HtmlDecode(text);
UPDATE: french symbol "é" has HTML code "é
" so, you need to fix your input string.
回答3:
You should use SecurityElement.Escape when working with XML files.
HtmlEncode
will encode a lot of extra entities that are not required. XML only requires that you escape >, <, &, ", and ', which SecurityElement.Escape
does.
When reading the file back through an XML parser, this conversion is done for you by the parser, you shouldn't need to "decode" it.
EDIT: Of course this is only helpful when writing XML files.
回答4:
I think this works:
string utf8String = "Your string";
Encoding utf8 = Encoding.UTF8;
Encoding unicode = Encoding.Unicode;
byte[] utf8Bytes = utf8.GetBytes(utf8String);
byte[] unicodeBytes = Encoding.Convert(utf8, unicode, utf8Bytes);
char[] uniChars = new char[unicode.GetCharCount(unicodeBytes, 0, unicodeBytes.Length)];
unicode.GetChars(unicodeBytes, 0, unicodeBytes.Length, uniChars, 0);
string unicodeString = new string(uniChars);
回答5:
Use HttpUtility.HtmlDecode
. Example:
class Program
{
static void Main()
{
XDocument doc = new XDocument(new XElement("test",
HttpUtility.HtmlDecode("café")));
Console.WriteLine(doc);
Console.ReadKey();
}
}
来源:https://stackoverflow.com/questions/9875940/how-to-properly-decode-accented-characters-for-display