Pretty print XML in java 8

女生的网名这么多〃 提交于 2019-11-28 06:51:31
Aldo

I guess that the problem is related to blank text nodes (i.e. text nodes with only whitespaces) in the original file. You should try to programmatically remove them just after the parsing, using the following code. If you don't remove them, the Transformer is going to preserve them.

original.getDocumentElement().normalize();
XPathExpression xpath = XPathFactory.newInstance().newXPath().compile("//text()[normalize-space(.) = '']");
NodeList blankTextNodes = (NodeList) xpath.evaluate(original, XPathConstants.NODESET);

for (int i = 0; i < blankTextNodes.getLength(); i++) {
     blankTextNodes.item(i).getParentNode().removeChild(blankTextNodes.item(i));
}
Stephan

In reply to Espinosa's comment, here is a solution when "the original xml is not already (partially) indented or contain new lines".

Background

Excerpt from the article (see References below) inspiring this solution:

Based on the DOM specification, whitespaces outside the tags are perfectly valid and they are properly preserved. To remove them, we can use XPath’s normalize-space to locate all the whitespace nodes and remove them first.

Java Code

public static String toPrettyString(String xml, int indent) {
    try {
        // Turn xml string into a document
        Document document = DocumentBuilderFactory.newInstance()
                .newDocumentBuilder()
                .parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));

        // Remove whitespaces outside tags
        document.normalize();
        XPath xPath = XPathFactory.newInstance().newXPath();
        NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
                                                      document,
                                                      XPathConstants.NODESET);

        for (int i = 0; i < nodeList.getLength(); ++i) {
            Node node = nodeList.item(i);
            node.getParentNode().removeChild(node);
        }

        // Setup pretty print options
        TransformerFactory transformerFactory = TransformerFactory.newInstance();
        transformerFactory.setAttribute("indent-number", indent);
        Transformer transformer = transformerFactory.newTransformer();
        transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
        transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");

        // Return pretty print xml string
        StringWriter stringWriter = new StringWriter();
        transformer.transform(new DOMSource(document), new StreamResult(stringWriter));
        return stringWriter.toString();
    } catch (Exception e) {
        throw new RuntimeException(e);
    }
}

Sample usage

String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

System.out.println(toPrettyString(xml, 4));

Output

<root>
    <name>Coco Puff</name>
    <total>10</total>
</root>

References

This works on Java 8:

public static void main (String[] args) throws Exception {
    String xmlString = "<hello><from>ME</from></hello>";
    DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance();
    DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder();
    Document document = documentBuilder.parse(new InputSource(new StringReader(xmlString)));
    pretty(document, System.out, 2);
}

private static void pretty(Document document, OutputStream outputStream, int indent) throws Exception {
    TransformerFactory transformerFactory = TransformerFactory.newInstance();
    Transformer transformer = transformerFactory.newTransformer();
    transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
    if (indent > 0) {
        transformer.setOutputProperty(OutputKeys.INDENT, "yes");
        transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", Integer.toString(indent));
    }
    Result result = new StreamResult(outputStream);
    Source source = new DOMSource(document);
    transformer.transform(source, result);
}

I've written a simple class for for removing whitespace in documents - supports command-line and does not use DOM / XPath.

Edit: Come to think of it, the project also contains a pretty-printer which handles existing whitespace:

PrettyPrinter prettyPrinter = PrettyPrinterBuilder.newPrettyPrinter().ignoreWhitespace().build();

I didn't like any of the common XML formatting solutions because they all remove more than 1 consecutive new line character (for some reason, removing spaces/tabs and removing new line characters are inseparable...). Here's my solution, which was actually made for XHTML but should do the job with XML as well:

public String GenerateTabs(int tabLevel) {
  char[] tabs = new char[tabLevel * 2];
  Arrays.fill(tabs, ' ');

  //Or:
  //char[] tabs = new char[tabLevel];
  //Arrays.fill(tabs, '\t');

  return new String(tabs);
}

public String FormatXHTMLCode(String code) {
  // Split on new lines.
  String[] splitLines = code.split("\\n", 0);

  int tabLevel = 0;

  // Go through each line.
  for (int lineNum = 0; lineNum < splitLines.length; ++lineNum) {
    String currentLine = splitLines[lineNum];

    if (currentLine.trim().isEmpty()) {
      splitLines[lineNum] = "";
    } else if (currentLine.matches(".*<[^/!][^<>]+?(?<!/)>?")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      ++tabLevel;
    } else if (currentLine.matches(".*</[^<>]+?>")) {
      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }

      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    } else if (currentLine.matches("[^<>]*?/>")) {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];

      --tabLevel;

      if (tabLevel < 0) {
        tabLevel = 0;
      }
    } else {
      splitLines[lineNum] = GenerateTabs(tabLevel) + splitLines[lineNum];
    }
  }

  return String.join("\n", splitLines);
}

It makes one assumption: that there are no <> characters except for those that comprise the XML/XHTML tags.

Underscore-java8 has static method U.formatXml(string). I am the maintainer of the project. Live example

import com.github.underscore.lodash.U;

public class MyClass {
    public static void main(String args[]) {
        String xml = "<root>" + //
             "\n   "  + //
             "\n<name>Coco Puff</name>" + //
             "\n        <total>10</total>    </root>";

        System.out.println(U.formatXml(xml));
    }
}

Output:

<root>
   <name>Coco Puff</name>
   <total>10</total>
</root>

Create xml file :

new FileInputStream("xml Store - Copy.xml") ;// result xml file format incorrect ! 

so that, when parse the content of the given input source as an XML document and return a new DOM object.

Document original = null;
...
original.parse("data.xml");//input source as an XML document
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!