问题
How can I ignore the DTD declaration when parsing file with XOM xml library. My file has the following line :
<?xml version="1.0"?>
<!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
//rest of stuff here
And when I try to build() my document I get a filenotfound exception for the DTD file. I know I don't have this file and I don't care about it, so how can it be removed when using XOM?
Here is a code snippet:
public BlastXMLParser(String filePath) {
Builder b = new Builder(false);
//not a good idea to have exception-throwing code in constructor
try {
_document = b.build(filePath);
} catch (ParsingException ex) {
Logger.getLogger(BlastXMLParser.class.getName()).log(Level.SEVERE,"err", ex);
} catch (IOException ex) {
//
}
private Elements getBlastReads() {
Element root = _document.getRootElement();
Elements rootChildren = root.getChildElements();
for (int i = 0; i < rootChildren.size(); i++) {
Element child = rootChildren.get(i);
if (child.getLocalName().equals("BlastOutput_iterations")) {
return child.getChildElements();
}
}
return null;
}
}
I get a NullPointerException at this line:
Element root = _document.getRootElement();
With the DTD line removed from the source XML file I can successfully parse it, but this is not an option in the final production system.
回答1:
The preferred solution would be to implement an EntityResolver that intercepts requests for the DTD and redirects these to an embedded copy. If you
- don't have access to the DTD and
- are absolutely sure you won't need it (apart from validation it might also declare character entities that are used in the document) and
- you are using the Xerces XML Parser implementation
you can disable fetching of DTD by setting the corresponding SAX feature. In XOM this should be possible by passing an XMLReader to the Builder constructor like this:
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.XMLReaderFactory;
...
XMLReader xmlreader = XMLReaderFactory.createXMLReader();
xmlreader.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
Builder builder = new Builder(xmlreader);
回答2:
According to their documentation this is the way to parse document without any validation.
try {
Builder parser = new Builder();
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. How embarrassing!");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
If you do want to validate XML schema you have to call new Builder(true)
:
try {
Builder parser = new Builder(true);
Document doc = parser.build("http://www.cafeconleche.org/");
}
catch (ValidityException ex) {
System.err.println("Cafe con Leche is invalid today. (Somewhat embarrassing.)");
}
catch (ParsingException ex) {
System.err.println("Cafe con Leche is malformed today. (How embarrassing!)");
}
catch (IOException ex) {
System.err.println("Could not connect to Cafe con Leche. The site may be down.");
}
Pay attention that now yet another exception can be thrown: ValidityException
来源:https://stackoverflow.com/questions/8081214/ignoring-dtd-when-parsing-xml