I have a formatted XML file, and I want to convert it to one line string, how can I do that.
Sample xml:
//filename is filepath string
BufferedReader br = new BufferedReader(new FileReader(new File(filename)));
String line;
StringBuilder sb = new StringBuilder();
while((line=br.readLine())!= null){
sb.append(line.trim());
}
using StringBuilder is more efficient then concat http://kaioa.com/node/59
The above solutions work if you are compressing all white space in the XML document. Other quick options are JDOM (using Format.getCompactFormat()) and dom4j (using OutputFormat.createCompactFormat()) when outputting the XML document.
However, I had a unique requirement to preserve the white space contained within the element's text value and these solutions did not work as I needed. All I needed was to remove the 'pretty-print' formatting added to the XML document.
The solution that I came up with can be explained in the following 3-step/regex process ... for the sake of understanding the algorithm for the solution.
String regex, updatedXml;
// 1. remove all white space preceding a begin element tag:
regex = "[\\n\\s]+(\\<[^/])";
updatedXml = originalXmlStr.replaceAll( regex, "$1" );
// 2. remove all white space following an end element tag:
regex = "(\\</[a-zA-Z0-9-_\\.:]+\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );
// 3. remove all white space following an empty element tag
// (<some-element xmlns:attr1="some-value".... />):
regex = "(/\\>)[\\s]+";
updatedXml = updatedXml.replaceAll( regex, "$1" );
NOTE: The pseudo-code is in Java ... the '$1' is the replacement string which is the 1st capture group.
This will simply remove the white space used when adding the 'pretty-print' format to an XML document, yet preserve all other white space when it is part of the element text value.
Run it through an XSLT identity transform with <xsl:output indent="no"> and <xsl:strip-space elements="*"/>
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="no" />
<xsl:strip-space elements="*"/>
<xsl:template match="@*|node()">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
It will remove any of the non-significant whitespace and produce the expected output that you posted.
Using this answer which provides the code to use Dom4j to do pretty-printing, change the line that sets the output format from: createPrettyPrint()
to: createCompactFormat()
public String unPrettyPrint(final String xml){
if (StringUtils.isBlank(xml)) {
throw new RuntimeException("xml was null or blank in unPrettyPrint()");
}
final StringWriter sw;
try {
final OutputFormat format = OutputFormat.createCompactFormat();
final org.dom4j.Document document = DocumentHelper.parseText(xml);
sw = new StringWriter();
final XMLWriter writer = new XMLWriter(sw, format);
writer.write(document);
}
catch (Exception e) {
throw new RuntimeException("Error un-pretty printing xml:\n" + xml, e);
}
return sw.toString();
}