I\'m trying to insert into XML column (SQL SERVER 2008 R2), but the server\'s complaining:
System.Data.SqlClient.SqlException (0x80131904):
XML
You are serializing to a string rather than a byte array so, at this point, any encoding hasn't happened yet.
What does the start of "messageToLog" look like? Is the XML specifying an encoding (e.g. utf-8) which subsequently turns out to be wrong?
Edit
Based on your further info it sounds like the string is automatically converted to utf-8 when it is passed to the database, but the database chokes because the XML declaration says it is utf-16.
In which case, you don't need to serialize to utf-8. You need to serialize with the "encoding=" omitted from the XML. The XmlFragmentWriter (not a standard part of .Net, Google it) lets you do this.
Although a .net string is always UTF-16
you need to serialize the object using UTF-16
encoding.
That sould be something like this:
public static string ToString(object source, Type type, Encoding encoding)
{
// The string to hold the object content
String content;
// Create a memoryStream into which the data can be written and readed
using (var stream = new MemoryStream())
{
// Create the xml serializer, the serializer needs to know the type
// of the object that will be serialized
var xmlSerializer = new XmlSerializer(type);
// Create a XmlTextWriter to write the xml object source, we are going
// to define the encoding in the constructor
using (var writer = new XmlTextWriter(stream, encoding))
{
// Save the state of the object into the stream
xmlSerializer.Serialize(writer, source);
// Flush the stream
writer.Flush();
// Read the stream into a string
using (var reader = new StreamReader(stream, encoding))
{
// Set the stream position to the begin
stream.Position = 0;
// Read the stream into a string
content = reader.ReadToEnd();
}
}
}
// Return the xml string with the object content
return content;
}
By setting the encoding to Encoding.Unicode not only the string will be UTF-16
but you should also get the xml string as UTF-16
.
<?xml version="1.0" encoding="utf-16"?>
Default encoding for a xml serializer should be UTF-16. Just to make sure you can try -
XmlSerializer serializer = new XmlSerializer(typeof(YourObject));
// create a MemoryStream here, we are just working
// exclusively in memory
System.IO.Stream stream = new System.IO.MemoryStream();
// The XmlTextWriter takes a stream and encoding
// as one of its constructors
System.Xml.XmlTextWriter xtWriter = new System.Xml.XmlTextWriter(stream, Encoding.UTF16);
serializer.Serialize(xtWriter, yourObjectInstance);
xtWriter.Flush();
@ziesemer's answer (above) is the only fully correct answer to this question and the linked duplicates of this question. However, it could still use a little more explanation and some clarification. Consider this as an extension of @ziesemer's answer.
Even if they produce the desired result, most answers to this question (including the duplicate question) are convoluted and go through many unnecessary steps. The main issue here is the overall lack of understanding regarding how the XML
datatype actually works in SQL Server (not surprising given that it isn't well documented). The XML
type:
msdn
site). The optimizations include:
<ElementName>...</ElementName>
" takes up 27 character (i.e. 54 bytes) in string form, but only 11 characters (i.e. 22 bytes) when stored in the XML
type. And that is for a single instance of it. Multiple instances take up additional multiples of the 54 bytes. But in the XML type, each instance only takes up the space of that numeric ID, most likely a 4-byte int.Can have 8-bit / non-UTF-16 data passed in. In this case, you need to make sure that the string is not an NVARCHAR
string (i.e. not prefixed with an upper-case "N" for literals, not declared as NVARCHAR
when dealing with T-SQL variables, and not declared as SqlDbType.NVarChar
in .NET). AND, you need to make sure that you do have the XML
declaration, and that it specifies the correct encoding.
PRINT 'VARCHAR / UTF-8:';
DECLARE @XML_VC_8 XML;
SET @XML_VC_8 = '<?xml version="1.0" encoding="utf-8"?><test/>';
PRINT 'Success!'
-- Success!
GO
PRINT '';
PRINT 'NVARCHAR / UTF-8:';
DECLARE @XML_NVC_8 XML;
SET @XML_NVC_8 = N'<?xml version="1.0" encoding="utf-8"?><test/>';
PRINT 'Success!'
/*
Msg 9402, Level 16, State 1, Line XXXXX
XML parsing: line 1, character 38, unable to switch the encoding
*/
GO
PRINT '';
PRINT 'VARCHAR / UTF-16:';
DECLARE @XML_VC_16 XML;
SET @XML_VC_16 = '<?xml version="1.0" encoding="utf-16"?><test/>';
PRINT 'Success!'
/*
Msg 9402, Level 16, State 1, Line XXXXX
XML parsing: line 1, character 38, unable to switch the encoding
*/
GO
PRINT '';
PRINT 'NVARCHAR / UTF-16:';
DECLARE @XML_NVC_16 XML;
SET @XML_NVC_16 = N'<?xml version="1.0" encoding="utf-16"?><test/>';
PRINT 'Success!'
-- Success!
As you can see, when the input string is NVARCHAR
, then the XML declaration can be included, but it needs to be "UTF-16".
When the input string is VARCHAR
then the XML declaration can be included, but it cannot be "UTF-16". It can, however, be any valid 8-bit encoding, in which case the bytes for that encoding will be converted into UTF-16, as shown below:
DECLARE @XML XML;
SET @XML = '<?xml version="1.0" encoding="utf-8"?><test attr="'
+ CHAR(0xF0) + CHAR(0x9F) + CHAR(0x98) + CHAR(0x8E) + '"/>';
SELECT @XML;
-- <test attr="
A string is always UTF-16 in .NET, so as long as you stay inside your managed app you don't have to care about which encoding it is.
The problem is more likely where you talk to the SQL server. Your question doesn't show that code so it's hard to pin point the exact error. My suggestion is you check if there's a property or attribute you can set on that code that specifies the encoding of the data sent to the server.
This question is a near-duplicate of 2 others, and surprisingly - while this one is the most recent - I believe it is missing the best answer.
The duplicates, and what I believe to be their best answers, are:
In the end, it doesn't matter what encoding is declared or used, as long as the XmlReader can parse it locally within the application server.
As was confirmed in Most efficient way to read XML in ADO.net from XML type column in SQL server?, SQL Server stores XML in an efficient binary format. By using the SqlXml class, ADO.net can communicate with SQL Server in this binary format, and not require the database server to do any serialization or de-serialization of XML. This should also be more efficient for transport across the network.
By using SqlXml
, XML will be sent pre-parsed to the database, and then the DB doesn't need to know anything about character encodings - UTF-16 or otherwise. In particular, note that the XML declarations aren't even persisted with the data in the database, regardless of which method is used to insert it.
Please refer to the above-linked answers for methods that look very similar to this, but this example is mine:
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;
using System.IO;
using System.Xml;
static class XmlDemo {
static void Main(string[] args) {
using(SqlConnection conn = new SqlConnection()) {
conn.ConnectionString = "...";
conn.Open();
using(SqlCommand cmd = new SqlCommand("Insert Into TestData(Xml) Values (@Xml)", conn)) {
cmd.Parameters.Add(new SqlParameter("@Xml", SqlDbType.Xml) {
// Works.
// Value = "<Test/>"
// Works. XML Declaration is not persisted!
// Value = "<?xml version=\"1.0\"?><Test/>"
// Works. XML Declaration is not persisted!
// Value = "<?xml version=\"1.0\" encoding=\"UTF-16\"?><Test/>"
// Error ("unable to switch the encoding" SqlException).
// Value = "<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>"
// Works. XML Declaration is not persisted!
Value = new SqlXml(XmlReader.Create(new StringReader("<?xml version=\"1.0\" encoding=\"UTF-8\"?><Test/>")))
});
cmd.ExecuteNonQuery();
}
}
}
}
Note that I would not consider the last (non-commented) example to be "production-ready", but left it as-is to be concise and readable. If done properly, both the StringReader
and the created XmlReader
should be initialized within using
statements to ensure that their Close()
methods are called when complete.
From what I've seen, the XML declarations are never persisted when using an XML column. Even without using .NET and just using this direct SQL insert statement, for example, the XML declaration is not saved into the database with the XML:
Insert Into TestData(Xml) Values ('<?xml version="1.0" encoding="UTF-8"?><Test/>');
Now in terms of the OP's question, the object to be serialized still needs to be converted into an XML structure from the MyMessage
object, and XmlSerializer
is still needed for this. However, at worst, instead of serializing to a String, the message could instead be serialized to an XmlDocument
- which can then be passed to SqlXml
through a new XmlNodeReader - avoiding a de-serialization/serialization trip to a string. (See http://blogs.msdn.com/b/jongallant/archive/2007/01/30/how-to-convert-xmldocument-to-xmlreader-for-sqlxml-data-type.aspx for details and an example.)
Everything here was developed against and tested with .NET 4.0 and SQL Server 2008 R2.
Please don't make waste by running XML through extra conversions (de-deserializations and serializations - to DOM, strings, or otherwise), as shown in other answers here and elsewhere.