问题
I'm trying to parse a String which contains XML content which conforms to the XML 1.1 spec. The XML contains character references which are not allowed in the XML 1.0 spec but which are allowed in the XML 1.1 spec (character references which translate to Unicode characters in the range U+0001–U+001F).
According the Xerces2 website, the Xerces2 parser supports parsing XML 1.1 documents. However, I cannot figure out how to tell it the XML we are trying to parse contains 1.1-compliant XML.
I'm using a DocumentBuilder to parse the XML (something like this):
public Element parseString(String xmlString) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder documentBuilder = dbf.newDocumentBuilder();
InputSource source = new InputSource(new StringReader(xmlString));
// Throws org.xml.sax.SAXParseException becuase of the invalid character refs
Document doc = documentBuilder.parse(source);
return doc.getDocumentElement();
} catch (ParserConfigurationException pce) {
// Handle the error
} catch (SAXException se) {
// Handle the error
} catch (IOException ioe) {
// Handle the error
}
}
I've tried setting the XML header to indicate the XML conforms to the 1.1 spec...
xmlString = "<?xml version=\"1.1\" encoding=\"UTF-8\" ?>" + xmlString;
...but it is still parsed as 1.0 XML (still generates the invalid character reference exceptions).
How can I configure the Xerces parser to parse the XML as XML 1.1? Is there an alternative parser which provides better support for XML 1.1?
回答1:
See here for a list of all the features supported by xerces. May be below 2 features is what you have to turn on.
http://xml.org/sax/features/unicode-normalization-checking
True: Perform Unicode normalization checking (as described in section 2.13 and Appendix B of the XML 1.1 Recommendation) and report normalization errors.
False: Do not report Unicode normalization errors.
http://xml.org/sax/features/xml-1.1
True: The parser supports both XML 1.0 and XML 1.1.
False: The parser supports only XML 1.0.
Access: read-only
Since: Xerces-J 2.7.0
Note: The value of this feature will depend on whether the parser configuration owned by the SAX parser is known to support XML 1.1.
回答2:
Not sure how to do this with Xerces, but Woodstox supports XML 1.1 out of the box. While it is primarily a Stax parser, it also implements SAX API (since version 3.2).
来源:https://stackoverflow.com/questions/9312517/how-can-i-parse-xml-that-confirms-to-the-1-1-spec-using-java-and-xerces