TransformerFactory - avoiding network lookups to verify DTDs

偶尔善良 提交于 2019-12-07 14:24:50

问题


I am needing to program for offline transformation of XML documents. I have been able to stop DTD network lookups when loading the original XML file with the following :

DocumentBuilderFactory factory;

factory = DocumentBuilderFactory.newInstance();
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
docbuilder = factory.newDocumentBuilder();
doc = docbuilder.parse(new FileInputStream(m_strFilePath));

However, I am unable to apply this to the TransformerFactory object. The DTDs are available locally, but I do not know how to direct the transformer to look at the local files as opposed to trying to do a network lookup.

From what I can see, the transformer needs these documents to correctly do the transformation.

For information, I am transforming MusicXML documents from Partwise to Timewise.

As you have probably guessed, XSLT is not my strong point (far from it).

Do I need to modify the XSLT files to reference local files, or can this be done differently ?


Further to the comments below, here is an excerpt of the xsl file. It is the only place that I see which refers to an external file :

<!--
  XML output, with a DOCTYPE refering the timewise DTD.
  Here we use the full Internet URL. 
-->
<xsl:output method="xml" indent="yes" encoding="UTF-8"
    omit-xml-declaration="no" standalone="no"
    doctype-system="http://www.musicxml.org/dtds/timewise.dtd"
    doctype-public="-//Recordare//DTD MusicXML 2.0 Timewise//EN" />

Is the mentioned technique valid for this also ?

The DTD file contains references to a number of MOD files like this :

<!ENTITY % layout PUBLIC
    "-//Recordare//ELEMENTS MusicXML 2.0 Layout//EN"
    "layout.mod">

I presume that these files will also be imported in turn also.


回答1:


Ok, here is the answer which works for me.

1st step : load the original document, turning off validation and dtd loading within the factory.

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
// stop the network loading of DTD files
factory.setValidating(false);
factory.setNamespaceAware(true);
factory.setFeature("http://xml.org/sax/features/namespaces", false);
factory.setFeature("http://xml.org/sax/features/validation", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
// open up the xml document
DocumentBuilder docbuilder = factory.newDocumentBuilder();
Document doc = docbuilder.parse(new FileInputStream(m_strFilePath));

2nd step : Now that I have got the document in memory ... and after having detected that I need to transform it -

TransformerFactory transformfactory = TransformerFactory.newInstance();
Templates xsl = transformfactory.newTemplates(new StreamSource(new FileInputStream((String)m_XslFile)));
Transformer transformer = xsl.newTransformer();
Document newdoc = docbuilder.newDocument();
Result XmlResult = new DOMResult(newdoc);
// now transform
transformer.transform(
        new DOMSource(doc.getDocumentElement()),
        XmlResult);

I needed to do this as I have further processing going on afterwards and did not want the overhead of outputting to file and reloading.

Little explanation :

The trick is to use the original DOM object which has had all the validation features turned off. You can see this here :

transformer.transform(
        new DOMSource(doc.getDocumentElement()),  // <<-----
        XmlResult);

This has been tested with network access TURNED OFF. So I know that there are no more network lookups.

However, if the DTDs, MODs, etc are available locally, then, as per the suggestions, the use of an EntityResolver is the answer. This to be applied, again, to the original docbuilder object.

I now have a transformed document stored in newdoc, ready to play with.

I hope this will help others.




回答2:


You can use a library like Apache xml-commons-resolver and write a catalog file to map web URLs to your local copy of the relevant files. To wire this catalog up to the transformer mechanism you would need to use a SAXSource instead of a StreamSource as the source of your stylesheet:

SAXSource styleSource = new SAXSource(new InputSource("file:/path/to/stylesheet.xsl"));
CatalogResolver resolver = new CatalogResolver();
styleSource.getXMLReader().setEntityResolver(resolver);
TransformerFactory tf = TransformerFactory.newInstance();
tf.setURIResolver(resolver);
Transformer transformer = tf.newTransformer(styleSource);



回答3:


The usual way to do this in Java is to use an LSResourceResolver to resolve the system ID (and/or public ID) to your local file. This is documented at http://docs.oracle.com/javase/7/docs/api/org/w3c/dom/ls/LSResourceResolver.html. You shouldn't need anything outside of standard Java XML parser features to get this working.



来源:https://stackoverflow.com/questions/18966597/transformerfactory-avoiding-network-lookups-to-verify-dtds

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!