What are invalid characters in XML

后端 未结 15 1252
时光说笑
时光说笑 2020-11-22 03:23

I am working with some XML that holds strings like:

This is a string

Some of the strings that I am passing to the

相关标签:
15条回答
  • 2020-11-22 04:24

    Another way to remove incorrect XML chars in C# is using XmlConvert.IsXmlChar (Available since .NET Framework 4.0)

    public static string RemoveInvalidXmlChars(string content)
    {
       return new string(content.Where(ch => System.Xml.XmlConvert.IsXmlChar(ch)).ToArray());
    }
    

    or you may check that all characters are XML-valid:

    public static bool CheckValidXmlChars(string content)
    {
       return content.All(ch => System.Xml.XmlConvert.IsXmlChar(ch));
    }
    

    .Net Fiddle

    For example, the vertical tab symbol (\v) is not valid for XML, it is valid UTF-8, but not valid XML 1.0, and even many libraries (including libxml2) miss it and silently output invalid XML.

    0 讨论(0)
  • 2020-11-22 04:25

    The only illegal characters are &, < and > (as well as " or ' in attributes).

    They're escaped using XML entities, in this case you want &amp; for &.

    Really, though, you should use a tool or library that writes XML for you and abstracts this kind of thing away for you so you don't have to worry about it.

    0 讨论(0)
  • 2020-11-22 04:25

    In the Woodstox XML processor, invalid characters are classified by this code:

    if (c == 0) {
        throw new IOException("Invalid null character in text to output");
    }
    if (c < ' ' || (c >= 0x7F && c <= 0x9F)) {
        String msg = "Invalid white space character (0x" + Integer.toHexString(c) + ") in text to output";
        if (mXml11) {
            msg += " (can only be output using character entity)";
        }
        throw new IOException(msg);
    }
    if (c > 0x10FFFF) {
        throw new IOException("Illegal unicode character point (0x" + Integer.toHexString(c) + ") to output; max is 0x10FFFF as per RFC");
    }
    /*
     * Surrogate pair in non-quotable (not text or attribute value) content, and non-unicode encoding (ISO-8859-x,
     * Ascii)?
     */
    if (c >= SURR1_FIRST && c <= SURR2_LAST) {
        throw new IOException("Illegal surrogate pair -- can only be output via character entities, which are not allowed in this content");
    }
    throw new IOException("Invalid XML character (0x"+Integer.toHexString(c)+") in text to output");
    

    Source from here

    0 讨论(0)
提交回复
热议问题