How can I parse XML that confirms to the 1.1 spec using Java and Xerces?

£可爱£侵袭症+ 提交于 2019-11-28 01:33:19

问题


I'm trying to parse a String which contains XML content which conforms to the XML 1.1 spec. The XML contains character references which are not allowed in the XML 1.0 spec but which are allowed in the XML 1.1 spec (character references which translate to Unicode characters in the range U+0001–U+001F).

According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.

I'm using a DocumentBuilder to parse the XML (something like this):

public Element parseString(String xmlString) {
    try {
          DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
          DocumentBuilder documentBuilder = dbf.newDocumentBuilder();

          InputSource source = new InputSource(new StringReader(xmlString));

      // Throws org.xml.sax.SAXParseException becuase of the invalid character refs
          Document doc = documentBuilder.parse(source);

          return doc.getDocumentElement();

    } catch (ParserConfigurationException pce) {
          // Handle the error
    } catch (SAXException se) {
          // Handle the error
    } catch (IOException ioe) {
          // Handle the error
    }
}

I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...

xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;

...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).

How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?


回答1:


See here for a list of all the features supported by xerces. May be below 2 features is what you have to turn on.

http://xml.org/sax/features/unicode-normalization-checking

True: Perform Unicode normalization checking (as described in section 2.13 and Appendix B of the XML 1.1 Recommendation) and report normalization errors.

False: Do not report Unicode normalization errors.

http://xml.org/sax/features/xml-1.1

True: The parser supports both XML 1.0 and XML 1.1.
False: The parser supports only XML 1.0.
Access: read-only Since: Xerces-J 2.7.0 Note: The value of this feature will depend on whether the parser configuration owned by the SAX parser is known to support XML 1.1.




回答2:


Not sure how to do this with Xerces, but Woodstox supports XML 1.1 out of the box. While it is primarily a Stax parser, it also implements SAX API (since version 3.2).



来源:https://stackoverflow.com/questions/9312517/how-can-i-parse-xml-that-confirms-to-the-1-1-spec-using-java-and-xerces

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!