How do I parse a large XML file using Java?

穿精又带淫゛_ 提交于 2021-02-07 10:50:15

问题


I am trying to parse an XML file using Java.

The XML file size is 256 kb only. I am using a DOM parser to parse the XML file. How can I parse the large XML file content?

Here's the method that parses the file content:

public Document parse_a_string(StringBuffer decodedFile) {
    Document doc1 = null;
    try {
        DocumentBuilderFactory factory =
                DocumentBuilderFactory.newInstance();
        DocumentBuilder db = factory.newDocumentBuilder();
        InputSource inStream = new InputSource();

         // problem here
        inStream.setCharacterStream(new StringReader(decodedFile.toString()));

        doc1 = db.parse(inStream);
    } catch (Exception e) {
    }
    return doc1;
}

The file content is in the StringBuffer reference object, decodedFile, but when I set it to StringReader it accept only string.


回答1:


For large documents (though I wouldn't call your's large) I'd use StAX.




回答2:


256Kb is a pretty small file nowadays: yesterday I was working with a 45Gb file which is a factor of 200,000 larger!

It's not clear what your problem is. Any of the normal Java parsing techniques will work perfectly well. Which of them you use depends on why you are parsing the file and what you want to do with the data.

Having said that, many people seem to choose DOM by default because it is so well entrenched. However, more modern object models such as JDOM or XOM are much easier to work with.




回答3:


Take a look at the JDOM XML parsing library. It's miles ahead of the native Java parsers, and in my opinion, quite superior.

For the code you provided, you actually have to walk the DOM tree and retrieve elements. See here or the official Java tutorial on working with XML for more information on working with XML documents.




回答4:


You might want to look at a StAX implementation like Woodstox. It lets you pull elements from the parser, instead of the parser pushing data into the app, and lets you pause parsing.




回答5:


Don't read the file into a String/StringReader and all that jazz. Parse the file directly via db.parse(new FileInputStream(...)). Reading the file into memory just wastes memory, and time.



来源:https://stackoverflow.com/questions/9197509/how-do-i-parse-a-large-xml-file-using-java

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!