Parsing unicode character (0x2) using XML1.1

前端 未结 2 478
不知归路
不知归路 2021-01-12 11:56

In my Java application, I need to parse an XML document that contains control character 0x2 inside CDATA.

I tried few ways but coudnt get through. I wa

相关标签:
2条回答
  • 2021-01-12 12:21

    I need to parse xml that contains control character 0x2 inside CDATA

    That's not XML, then. A raw control character U+0002 anywhere means it's not well-formed and hence not an XML document.

    In XML 1.1 only, one may include control characters encoded as character reference. So you might have tried to fix it up by doing a string replace for \x02 with  before parsing. However, you can't put character references in CDATA sections, so that's not going to fly either.

    edit: you could probably fix it in the short-term, if you are absolutely sure that every stray U+0002 character is inside a CDATA section, by replacing each with:

    ]]>&#2;<![CDATA[
    

    However this is super-shonky. Whatever generated the faulty XML in the first place needs to be fixed. Go kick the person responsible for creating it!

    0 讨论(0)
  • 2021-01-12 12:35

    XML cannt contain ASCII control characters (apart from TAB, CR and LF), not even inside a CDATA section. They are disallowed by the XML spec.

    Encode binary data into Base64 strings and write them to XML. No need for CDATA in this case.

    0 讨论(0)
提交回复
热议问题