Unicode(0xb) error while parsing an XML file using Stax

后端 未结 3 459
半阙折子戏
半阙折子戏 2021-01-18 20:38

While parsing an XML file Stax produces an error:

Unicode(0xb) error-An invalid XML character (Unicode: 0xb) was found in the element content of the d

相关标签:
3条回答
  • 2021-01-18 20:45

    According to the XML W3C Recommendation 0xb is not allowed in an XML file:

    Character Range [2] Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* any Unicode character, excluding the surrogate blocks, FFFE, and FFFF. */

    So strictly speaking your input file is not an XML file.

    0 讨论(0)
  • 2021-01-18 21:05

    Whenever invalid xml character comes xml, it gives such error. When u open it in notepad++ it look like VT, SOH,FF like these are invalid xml chars. I m using xml version 1.0 and i validate text data before entering it in database by pattern

    Pattern p = Pattern.compile("[^\u0009\u000A\u000D\u0020-\uD7FF\uE000-\uFFFD\u10000-\u10FFF]+");
    retunContent = p.matcher(retunContent).replaceAll("");
    

    It will ensure that no invalid special char will enter in xml

    0 讨论(0)
  • 2021-01-18 21:09

    0xB (vertical tab) is not a valid character in XML. The only valid characters before ASCII 32 (0x20, space) are 0x9 (tab), 0xA (carriage return) and 0xD (line feed).

    In short, what you are trying to parse is NOT XML.

    0 讨论(0)
提交回复
热议问题