I have an XML file begining like this:
Do you have a byte-order-mark (BOM) at the beginning of your XML, and does it match your encoding ? If you chop out your header, you'll also chop out the BOM and if that is incorrect, then subsequent parsing may work.
You may need to inspect your document at the byte level to see the BOM.
Try this:
int startIndex = xmlString.IndexOf('<');
if (startIndex > 0)
{
xmlString = xmlString.Remove(0, startIndex);
}
If you only have bytes you could either load the bytes into a stream:
XmlDocument oXML;
using (MemoryStream oStream = new MemoryStream(oBytes))
{
oXML = new XmlDocument();
oXML.Load(oStream);
}
Or you could convert the bytes into a string (presuming that you know the encoding) before loading the XML:
string sXml;
XmlDocument oXml;
sXml = Encoding.UTF8.GetString(oBytes);
oXml = new XmlDocument();
oXml.LoadXml(sXml);
I've shown my example as .NET 2.0 compatible, if you're using .NET 3.5 you can use XDocument
instead of XmlDocument
.
Load the bytes into a stream:
XDocument oXML;
using (MemoryStream oStream = new MemoryStream(oBytes))
using (XmlTextReader oReader = new XmlTextReader(oStream))
{
oXML = XDocument.Load(oReader);
}
Convert the bytes into a string:
string sXml;
XDocument oXml;
sXml = Encoding.UTF8.GetString(oBytes);
oXml = XDocument.Parse(sXml);
Why bothering to read the file as a byte sequence and then converting it to string while it is an xml file? Just leave the framework do the loading for you and cope with the encodings:
var xml = XDocument.Load("test.xml");
My first thought was that the encoding is Unicode when parsing XML from a .NET string type. It seems, though that XDocument's parsing is quite forgiving with respect to this.
The problem is actually related to the UTF8 preamble/byte order mark (BOM), which is a three-byte signature optionally present at the start of a UTF-8 stream. These three bytes are a hint as to the encoding being used in the stream.
You can determine the preamble of an encoding by calling the GetPreamble method on an instance of the System.Text.Encoding class. For example:
// returns { 0xEF, 0xBB, 0xBF }
byte[] preamble = Encoding.UTF8.GetPreamble();
The preamble should be handled correctly by XmlTextReader
, so simply load your XDocument
from an XmlTextReader
:
XDocument xml;
using (var xmlStream = new MemoryStream(fileContent))
using (var xmlReader = new XmlTextReader(xmlStream))
{
xml = XDocument.Load(xmlReader);
}