How Do You Prevent A javax Transformer From Escaping Whitespace?

走远了吗. 提交于 2019-12-06 07:04:01

So the answer to this one turned out to be pretty lame: update Xalan. I don't know what was wrong with my old version, but when I switched to the latest version at: http://xml.apache.org/xalan-j/downloads.html suddenly the entity-escaping of tabs just went away. Thanks everyone for all your help though.

You could try using a SAXTransformerFactory in combination with a XMLReader.

Something like:

SAXTransformerFactory transformFactory = (SAXTransformerFactory) TransformerFactory.newInstance();
StreamSource source = new StreamSource(TRANSFORMER_PATH);
StringWriter extractionWriter = new StringWriter();

TransformerHandler transformerHandler = null;
try {
    transformerHandler = transformFactory.newTransformerHandler(source);
    transformerHandler.setResult(new StreamResult(extractionWriter));
} catch (TransformerConfigurationException e) {
    throw new SAXException("Unable to create transformerHandler due to transformer configuration exception.");
}

XMLReader reader = SAXParserFactory.newInstance().newSAXParser().getXMLReader();
reader.setContentHandler(transformerHandler);
reader.parse(new InputSource(new FileReader(xml)));
System.err.println(extractionWriter.toString());

You should be able to set the SAX parser to not include ignorable whitespace, if it doesn't already do it by default. I haven't actually tested this, but I do something similar in one of my projects.

Sometimes with things like this, replacing them yourself with regex afterwards is not an entirely bad option, which at least gets you going until you find a better option later.

Is there any reason you are reading the file into a string first instead of using a file stream directly?

Instead of

String xml = FileUtils.readFileToString(new File(sampleXmlPath));
transformer.transform(new StreamSource(new StringReader(xml)),
    new StreamResult(extractionWriter));

You could try

transformer.transform(new StreamSource(new FileReader(sampleXmlPath)),
    new StreamResult(extractionWriter));

This may not be the cause of the problem, but I've seen it cause similar problems before. If your FileUtils.readFileToString is the Commons.IO version, it's reading the string in as UFT-16 (the Java default, IIRC) instead of what you want, which is UTF-8.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!