问题
I have a bunch of large XML files (total size of all files is more than 1 GB) and I need to transform them from a vendor schema to our schema.
The vendor has one ZIP file (it contains large XML files) at some FTP location on its server. I have to pick that ZIP file up and then transform all available XML files. After transforming to our schema format, I need to persist the data in a database.
What is a good design to implement this? What are relevant tools and utilities which support Java?
回答1:
Just use the regular Java API...
File zipFile = new File("archive.zip");
File xsltFile = new File("transform.xslt");
File transformedXmlsFile = new File("transformed.xml");
StreamSource xsltSource = new StreamSource(xsltFile);
Transformer transformer = TransformerFactory.newInstance().newTransformer(xsltSource);
ZipInputStream zipIn = new ZipInputStream(new FileInputStream(zipFile));
ZipEntry zipEntry;
OutputStream resultXmls = new FileOutputStream(transformedXmlsFile);
while ((zipEntry = zipIn.getNextEntry()) != null){
StreamSource inputXml = new StreamSource(zipIn);
StreamResult resultXml = new StreamResult(resultXmls);
transformer.transform(inputXml, resultXml);
}
zipIn.close();
resultXmls.close();
回答2:
I like the simple methods. I would use any SAX,Stax implementation, and do not use any DOM. But it just me, maybe you will find here a fancy library, which will do all work instead of you :)
1GB xml in DOM can eat your all ram! - be carefully what library are you choosing and what he use behind of scene.
I hope it helps!
回答3:
I used Saxon-EE for transformation and Woodstox for XML unmarshalling.
来源:https://stackoverflow.com/questions/12482729/fastest-and-best-way-to-transform-large-xml-docs-from-one-format-to-another