I have a Java String that contains XML, with no line feeds or indentations. I would like to turn it into a String with nicely formatted XML. How do I do this?
In case you do not need indentation that much but a few line breaks, it could be sufficient to simply regex...
String leastPrettifiedXml = uglyXml.replaceAll("><", ">\n<");
The code is nice, not the result because of missing indentation.
(For solutions with indentation, see other answers.)
Hmmm... faced something like this and it is a known bug ... just add this OutputProperty ..
transformer.setOutputProperty(OutputPropertiesFactory.S_KEY_INDENT_AMOUNT, "8");
Hope this helps ...
If using a 3rd party XML library is ok, you can get away with something significantly simpler than what the currently highest-voted answers suggest.
It was stated that both input and output should be Strings, so here's a utility method that does just that, implemented with the XOM library:
import nu.xom.*;
import java.io.*;
[...]
public static String format(String xml) throws ParsingException, IOException {
ByteArrayOutputStream out = new ByteArrayOutputStream();
Serializer serializer = new Serializer(out);
serializer.setIndent(4); // or whatever you like
serializer.write(new Builder().build(xml, ""));
return out.toString("UTF-8");
}
I tested that it works, and the results do not depend on your JRE version or anything like that. To see how to customise the output format to your liking, take a look at the Serializer API.
This actually came out longer than I thought - some extra lines were needed because Serializer
wants an OutputStream
to write to. But note that there's very little code for actual XML twiddling here.
(This answer is part of my evaluation of XOM, which was suggested as one option in my question about the best Java XML library to replace dom4j. For the record, with dom4j you could achieve this with similar ease using XMLWriter and OutputFormat. Edit: ...as demonstrated in mlo55's answer.)
I've pretty printed in the past using the org.dom4j.io.OutputFormat.createPrettyPrint() method
public String prettyPrint(final String xml){
if (StringUtils.isBlank(xml)) {
throw new RuntimeException("xml was null or blank in prettyPrint()");
}
final StringWriter sw;
try {
final OutputFormat format = OutputFormat.createPrettyPrint();
final org.dom4j.Document document = DocumentHelper.parseText(xml);
sw = new StringWriter();
final XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
}
catch (Exception e) {
throw new RuntimeException("Error pretty printing xml:\n" + xml, e);
}
return sw.toString();
}
All above solutions didn't work for me, then I found this http://myshittycode.com/2014/02/10/java-properly-indenting-xml-string/
The clue is remove whitespaces with XPath
String xml = "<root>" +
"\n " +
"\n<name>Coco Puff</name>" +
"\n <total>10</total> </root>";
try {
Document document = DocumentBuilderFactory.newInstance()
.newDocumentBuilder()
.parse(new InputSource(new ByteArrayInputStream(xml.getBytes("utf-8"))));
XPath xPath = XPathFactory.newInstance().newXPath();
NodeList nodeList = (NodeList) xPath.evaluate("//text()[normalize-space()='']",
document,
XPathConstants.NODESET);
for (int i = 0; i < nodeList.getLength(); ++i) {
Node node = nodeList.item(i);
node.getParentNode().removeChild(node);
}
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
StringWriter stringWriter = new StringWriter();
StreamResult streamResult = new StreamResult(stringWriter);
transformer.transform(new DOMSource(document), streamResult);
System.out.println(stringWriter.toString());
}
catch (Exception e) {
e.printStackTrace();
}
Underscore-java has static method U.formatXml(string)
. I am the maintainer of the project. Live example
import com.github.underscore.lodash.U;
public class MyClass {
public static void main(String args[]) {
String xml = "<tag><nested>hello</nested></tag>";
System.out.println(U.formatXml("<?xml version=\"1.0\" encoding=\"UTF-8\"?><root>" + xml + "</root>"));
}
}
Output:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<tag>
<nested>hello</nested>
</tag>
</root>