SQL Server: Replace invalid XML characters from a VARCHAR(MAX) field

前端 未结 3 738
余生分开走
余生分开走 2020-12-03 12:06

I have a VARCHAR(MAX) field which is being interfaced to an external system in XML format. The following errors were thrown by the interface:

相关标签:
3条回答
  • 2020-12-03 12:44

    There is a trick using the implicit conversion of VARBINARY to base64 and back:

    Here your list of evil

    DECLARE @evilChars VARCHAR(MAX)=
      CHAR(0x0)
    + CHAR(0x1)
    + CHAR(0x2)
    + CHAR(0x3)
    + CHAR(0x4)
    + CHAR(0x5)
    + CHAR(0x6)
    + CHAR(0x7)
    + CHAR(0x8)
    + CHAR(0x9)
    + CHAR(0xa)
    + CHAR(0xb)
    + CHAR(0xc)
    + CHAR(0xd)
    + CHAR(0xe)
    + CHAR(0xf)
    + CHAR(0x10)
    + CHAR(0x11)
    + CHAR(0x12)
    + CHAR(0x13)
    + CHAR(0x14)
    + CHAR(0x15)
    + CHAR(0x16)
    + CHAR(0x17)
    + CHAR(0x18)
    + CHAR(0x19)
    + CHAR(0x1a)
    + CHAR(0x1b)
    + CHAR(0x1c)
    + CHAR(0x1d)
    + CHAR(0x1e)
    + CHAR(0x1f)
    + CHAR(0x7f);
    

    This works

    DECLARE @XmlAsString NVARCHAR(MAX)=
    (
        SELECT @evilChars FOR XML PATH('test')
    );
    SELECT @XmlAsString;
    

    The result (some are "printed")

    <test>&#x00;&#x01;&#x02;&#x03;&#x04;&#x05;&#x06;&#x07;&#x08;    
    &#x0B;&#x0C;&#x0D;&#x0E;&#x0F;&#x10;&#x11;&#x12;&#x13;&#x14;&#x15;&#x16;&#x17;&#x18;&#x19;&#x1A;&#x1B;&#x1C;&#x1D;&#x1E;&#x1F;</test>
    

    The following is forbidden

    SELECT CAST(@XmlAsString AS XML)
    

    But you can use the implicit conversion of VARBINARY to base64

    DECLARE @base64 NVARCHAR(MAX)=
    (
        SELECT CAST(@evilChars AS VARBINARY(MAX)) FOR XML PATH('test')
    );
    SELECT @base64;
    

    The result

    <test>AAECAwQFBgcICQoLDA0ODxAREhMUFRYXGBkaGxwdHh9/</test>
    

    Now you've got your real XML including the special characters!

    SELECT CAST(CAST(@base64 AS XML).value('/test[1]','varbinary(max)') AS VARCHAR(MAX)) FOR XML PATH('reconverted')
    

    The result

    <reconverted>&#x0;&#x1;&#x2;&#x3;&#x4;&#x5;&#x6;&#x7;&#x8;  
    &#xB;&#xC;
    &#xE;&#xF;&#x10;&#x11;&#x12;&#x13;&#x14;&#x15;&#x16;&#x17;&#x18;&#x19;&#x1A;&#x1B;&#x1C;&#x1D;&#x1E;&#x1F;</reconverted>
    
    0 讨论(0)
  • 2020-12-03 12:45

    You need to use nvarchar(max) instead of varchar(max) but otherwise the change is fine.

    0 讨论(0)
  • 2020-12-03 12:54

    It is safe to use VARCHAR(MAX) as my data column is a VARCHAR(MAX) field. Also, there will be an overhead of converting VARCHAR(MAX) to NVARCHAR(MAX) if I pass a VARCHAR(MAX) field to the SQL function which accepts the NVARCHAR(MAX) param.

    Thank you very much @RhysJones, @Damien_The_Unbeliever for your comments.

    0 讨论(0)
提交回复
热议问题