Why does en-dash (–) trigger illegal XML character error (C#/SSMS)?

后端 未结 4 557
星月不相逢
星月不相逢 2021-01-05 04:19

This is not a question on how to overcome the \"XML parsing: ... illegal xml character\" error, but about why it is happening? I know tha

4条回答
  •  孤城傲影
    2021-01-05 04:55

    Please permit me to answer my own question, for the purpose of me understanding it fully myself. I won't accept this as the answer; it is the combination of the other answers that lead me here. If this answer helps you in the future, please upvote the other posts also.

    The basic underlying rule is that XML with Unicode characters should be passed to, and parsed as, Unicode by SQL Server. Therefore C# should generate XML as UTF-16; the SSMS and .Net default.

    Cause of original problem

    This variable declares XML with UTF-8 encoding, but the entity en-dash cannot be used without being encoded in UTF-8. This is wrong:

    DECLARE @badxml xml = '
    
      
    ';
    

    XML parsing: line 3, character 29, illegal xml character

    Another approach that doesn't work is to switch UTF-8 to UTF-16 in the XML. The string here is not unicode, so the implicit conversion fails:

    DECLARE @xml xml = '
    
      
    ';
    

    XML parsing: line 1, character 56, unable to switch the encoding

    Solutions

    Alternatives that work are:

    1) Leave as UTF-8 but encode with hexadecimal on the entity (reference):

    DECLARE @xml xml = '
    
      
    ';
    

    2) As above but with decimal encoding on the entity (reference):

    DECLARE @xml xml = '
    
      
    ';
    

    3) Include the original entity, but remove UTF-8 encoding in declaration (SSMS then applies UTF-16; its default):

    DECLARE @xml xml = '
    
      
    ';
    

    4) Retain the UTF-16 declaration, but cast the XML to Unicode (note the preceding N before casting as XML):

    DECLARE @xml xml = N'
    
      
    ';
    

提交回复
热议问题