This is not a question on how to overcome the \"XML parsing: ... illegal xml character\" error, but about why it is happening? I know tha
Please permit me to answer my own question, for the purpose of me understanding it fully myself. I won't accept this as the answer; it is the combination of the other answers that lead me here. If this answer helps you in the future, please upvote the other posts also.
The basic underlying rule is that XML with Unicode characters should be passed to, and parsed as, Unicode by SQL Server. Therefore C# should generate XML as UTF-16; the SSMS and .Net default.
This variable declares XML with UTF-8 encoding, but the entity en-dash cannot be used without being encoded in UTF-8. This is wrong:
DECLARE @badxml xml = '
';
XML parsing: line 3, character 29, illegal xml character
Another approach that doesn't work is to switch UTF-8 to UTF-16 in the XML. The string here is not unicode, so the implicit conversion fails:
DECLARE @xml xml = '
';
XML parsing: line 1, character 56, unable to switch the encoding
Alternatives that work are:
1) Leave as UTF-8 but encode with hexadecimal on the entity (reference):
DECLARE @xml xml = '
';
2) As above but with decimal encoding on the entity (reference):
DECLARE @xml xml = '
';
3) Include the original entity, but remove UTF-8 encoding in declaration (SSMS then applies UTF-16; its default):
DECLARE @xml xml = '
';
4) Retain the UTF-16 declaration, but cast the XML to Unicode (note the preceding N
before casting as XML):
DECLARE @xml xml = N'
';