why does xmltextreader convert html encoded utf8 characters to utf8 string automatically?

我的未来我决定 提交于 2019-12-31 02:17:28

问题


I receive an XML file with encoding "ISO-8859-1" (Latin-1)

Within the file (among other tags) I have <OtherText>Example &quot;content&quot; And &#9472;</OtherText>

Now for some reason when I load this into XMLTextReader and do a "XmlReader.Value" to return the value, it returns: "content" And ─

This then, when confronted with a database only accepting Latin-1 encoding, obviously errors.

I have tried the following:

  • Converting into bytes and using Encoding.Convert to change from UTF-8 into Latin-1 (which successfully gives me a bunch of "?" instead)
  • Using StreamReader(file,Encoding.whatever) to load the file into XmlTextReader

And several variations there-of and different methods on the internet and on StackOverflow istelf.

I understand that .NET strings are UTF-16, but what I don't understand is why, a fully Latin-1 formatted XML file with CORRECT markup for when UTF-8 characters exist which is compatible with older databases AND the web (for HTML markup etc) that it simply overrides that and output's the UTF-8 encoded string ANYWAY.

Is there noway to get around this other than writing my own custom text parser???


回答1:


I do not believe this is a problem with the encoding. What you're seeing is the XML string being un-escaped.

The problem is &quot; is a XML escape character, so XMLTextReader will un-escape this for you.

If you change this:

<OtherText>Example &quot;content&quot; And &#9472;</OtherText>

To this:

<OtherText>Example &amp;quot;content&amp;quot; And &amp;#9472;</OtherText>

Then

   XmlReader.Value = "&quot;content&quot; And &#9472;";

You'll need to wrap your value in CDATA so it is ignored by the parser.

Another option is to re-escape the string:

    using System.Security;
....
....
    string val = SecurityElement.Escape(xmlReader.Value);


来源:https://stackoverflow.com/questions/3308230/why-does-xmltextreader-convert-html-encoded-utf8-characters-to-utf8-string-autom

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!