How to read and write XML files and treat the comment nodes as text nodes in Java when saving

a 夏天 提交于 2020-01-30 08:06:12

问题


I'm reading an XML file in Java retrieved from an external system, then processing it and eventually save it locally and deploy it back.

The external system gives me an XML file that contains this node:

    <customApplications>
        <label><!-- GDPR Management --></label>
        <name>GDPR_Management</name>
    </customApplications>

The problem is the comment node. When I read the file and then just save it, the result looks like this:

    <customApplications>
        <label>
            <!-- GDPR Management -->
        </label>
        <name>GDPR_Management</name>
    </customApplications>

Which is a problem, because when I deploy the file back to the external system, it now thinks that the label has some text content. So I need the same result as it was, i.e. without the line breaks around the comment node.

I tried to remove all the comment nodes, which works well when deploying the file, but the file is also versioned using git and it produces many merge conflict as the file can be at any time retrieved again from the external system (the retrieved file is again with the comment nodes as you can see in the first example).

Then I tried to change all the comment nodes to text nodes before saving. The result is again not acceptable, because the label again has some text content:

    <customApplications>
        <label>&lt;!--  GDPR Management  --&gt;</label>
        <name>GDPR_Management</name>
    </customApplications>

How I read the document:

var docBuilder = DocumentBuilderFactory.newInstance().newDocumentBuilder();
var document = docBuilder.parse(inputStream);
document.getDocumentElement().normalize();
var xp = XPathFactory.newInstance().newXPath();
var nl = (NodeList) xp.evaluate("//text()[normalize-space(.)='']", document, XPathConstants.NODESET);
for (int i = 0; i < nl.getLength(); ++i) {
    var node = nl.item(i);
    node.getParentNode().removeChild(node);
}

How I save the document:

var result = new StreamResult(outputStream);
var transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "no");
transformer.setOutputProperty(OutputKeys.VERSION, "1.0");
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.transform(new DOMSource(document), result);

I really need the same result as the first example, but I do not care about how the comment node will be represented in the dom when processing the file.

Thanks for any tips!


回答1:


Don't use indent="yes" if you want the output to be identical to the original. Specifying indent="yes" allows the serializer to insert whitespace pretty-well anywhere it wants.



来源:https://stackoverflow.com/questions/58149533/how-to-read-and-write-xml-files-and-treat-the-comment-nodes-as-text-nodes-in-jav

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!