XmlException while deserializing xml file in UTF-16 encoding format

狂风中的少年 提交于 2019-12-06 03:02:47

问题


Using C#'s XmlSerializer.

In process of deserializing all xml files in a given folder, I see XmlException "There is an error in XML document (0, 0)". and InnerException is "There is no Unicode byte order mark. Cannot switch to Unicode".

All the xmls in the directory are "UTF-16" encoded. Only difference being, some xml files have elements missing that are defined in the class whose object I am using while deserialization.

For example, consider I have 3 different types of xmls in my folder:

file1.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
</ns0:PaymentStatus>

file2.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
</ns0:PaymentStatus>

file3.xml

<?xml version="1.0" encoding="utf-16"?>
<ns0:PaymentStatus xmlns:ns0="http://my.PaymentStatus">
<PaymentStatus2 RowNum="1" FeedID="38" />
<PaymentStatus2 RowNum="2" FeedID="39" Amt="26.0000" />
</ns0:PaymentStatus>

I have a class to represent the above xml:

[XmlTypeAttribute(AnonymousType = true, Namespace = "http://my.PaymentStatus")]
[XmlRootAttribute("PaymentStatus", Namespace = "http://http://my.PaymentStatus", IsNullable = true)]
public class PaymentStatus
{

    private PaymentStatus2[] PaymentStatus2Field;

    [XmlElementAttribute("PaymentStatus2", Namespace = "")]
    public PaymentStatus2[] PaymentStatus2 { get; set; }

    public PaymentStatus()
    {
        PaymentStatus2Field = null;
    }
}

[XmlTypeAttribute(AnonymousType = true)]
[XmlRootAttribute(Namespace = "", IsNullable = true)]

public class PaymentStatus2
{

    private byte rowNumField;
    private byte feedIDField;
    private decimal AmtField;
    public PaymentStatus2()
    {
        rowNumField = 0;
        feedIDField = 0;
        AmtField = 0.0M;
    }

    [XmlAttributeAttribute()]
    public byte RowNum { get; set; }

    [XmlAttributeAttribute()]
    public byte FeedID { get; set; }
    [System.Xml.Serialization.XmlAttributeAttribute()]
    public decimal Amt { get; set; }
}

Following snippet does the deserializing for me:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(fs));
}

Am I missing something? It has to be something with encoding format because when I try to manually replace UTF-16 by UTF-8 and that seems to work just fine.


回答1:


I ran into this same error today working with a third party web service.

I followed Alexei's advice by using a StreamReader and setting the encoding. After that the StreamReader can be used in the XmlTextReader constructor. Here's an implementation of this using the code from the original question:

foreach (string f in filePaths)
{
  XmlSerializer xsw = new XmlSerializer(typeof(PaymentStatus));
  FileStream fs = new FileStream(f, FileMode.Open);
  StreamReader stream = new StreamReader(fs, Encoding.UTF8);
  PaymentStatus config = (PaymentStatus)xsw.Deserialize(new XmlTextReader(stream));
}



回答2:


Most likely encoding="utf-16" is unrelated to encoding the XMLs are stored and thus causing parser to fail reading stream as UTF-16 text.

Since you have comment that changing to "encoding" parameter to "utf-8" let you read the text I assume files are actually UTF8. You can easily verify that by opening files as binary instead of text in your editor of choice (i.e. Visual Studio).

Most likely reason to get such mismatch is to save XML as writer.Write(document.OuterXml) (get string representation first which puts "utf-16", but than write string to stream with utf-8 encoding by default).

Possible workaround - to read XML in a way that symmetrical to write code - read as string and than load XML from string.

Proper fix - make sure XML is stored correctly.




回答3:


I don't know if this is the best way, but if my input stream does not contain a BOM I just use XDocument in order to handle different encodings... for example:

public static T DeserializeFromString<T>(String xml) where T : class
    {
        try
        {
            var xDoc = XDocument.Parse(xml);
            using (var xmlReader = xDoc.Root.CreateReader())
            {
                return new XmlSerializer(typeof(T)).Deserialize(xmlReader) as T;
            }
        }
        catch ()
        {
            return default(T);
        }
    }

Of course you'll probably want to throw back any exception, but in the case of the code I copied from I didn't need to know if or why it failed... so I just ate the exception.



来源:https://stackoverflow.com/questions/25298355/xmlexception-while-deserializing-xml-file-in-utf-16-encoding-format

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!