What are invalid characters in XML

后端 未结 15 1329
时光说笑
时光说笑 2020-11-22 03:23

I am working with some XML that holds strings like:

This is a string

Some of the strings that I am passing to the

相关标签:
15条回答
  • 2020-11-22 04:00

    The list of valid characters is in the XML specification:

    Char       ::=      #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]  /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */
    
    0 讨论(0)
  • 2020-11-22 04:01

    In addition to potame's answer, if you do want to escape using a CDATA block.

    If you put your text in a CDATA block then you don't need to use escaping. In that case you can use all characters in the following range:

    Note: On top of that, you're not allowed to use the ]]> character sequence. Because it would match the end of the CDATA block.

    If there are still invalid characters (e.g. control characters), then probably it's better to use some kind of encoding (e.g. base64).

    0 讨论(0)
  • 2020-11-22 04:01
    ampersand (&) is escaped to &
    
    double quotes (") are escaped to "
    
    single quotes (') are escaped to ' 
    
    less than (<) is escaped to &lt; 
    
    greater than (>) is escaped to &gt;
    

    In C#, use System.Security.SecurityElement.Escape or System.Net.WebUtility.HtmlEncode to escape these illegal characters.

    string xml = "<node>it's my \"node\" & i like it 0x12 x09 x0A  0x09 0x0A <node>";
    string encodedXml1 = System.Security.SecurityElement.Escape(xml);
    string encodedXml2= System.Net.WebUtility.HtmlEncode(xml);
    
    
    encodedXml1
    "&lt;node&gt;it&apos;s my &quot;node&quot; &amp; i like it 0x12 x09 x0A  0x09 0x0A &lt;node&gt;"
    
    encodedXml2
    "&lt;node&gt;it&#39;s my &quot;node&quot; &amp; i like it 0x12 x09 x0A  0x09 0x0A &lt;node&gt;"
    
    0 讨论(0)
  • 2020-11-22 04:05

    Another easy way to escape potentially unwanted XML / XHTML chars in C# is:

    WebUtility.HtmlEncode(stringWithStrangeChars)
    
    0 讨论(0)
  • 2020-11-22 04:06

    "XmlWriter and lower ASCII characters" worked for me

    string code = Regex.Replace(item.Code, @"[\u0000-\u0008,\u000B,\u000C,\u000E-\u001F]", "");
    
    0 讨论(0)
  • 2020-11-22 04:06

    For XSL (on really lazy days) I use:

    capture="&amp;(?!amp;)" capturereplace="&amp;amp;"
    

    to translate all &-signs that aren't follwed på amp; to proper ones.

    We have cases where the input is in CDATA but the system which uses the XML doesn't take it into account. It's a sloppy fix, beware...

    0 讨论(0)
提交回复
热议问题