parsing large XML using SAX in java

后端 未结 4 2021
天命终不由人
天命终不由人 2021-01-22 19:53

I am trying to parse the stack overflow data dump, one of the tables is called posts.xml which has around 10 million entry in it. Sample xml:



        
相关标签:
4条回答
  • 2021-01-22 20:07

    "StartElement" Sax Event permits to process a single XML ELement.

    In java code you must implement this method

    public void startElement(String uri, String localName,
        String qName, Attributes attributes)
        throws SAXException {
    
        if("row".equals(localName)) {
            //this code is executed for every xml element "row"
            String id = attributes.getValue("id");
            String PostTypeId = attributes.getValue("PostTypeId");
            String AcceptedAnswerId = attributes.getValue("AcceptedAnswerId");
            //others two
            // you have your att values for an "row" element
        }
    
     }
    

    For every element, you can access:

    1. Namespace URI
    2. XML QName
    3. XML LocalName
    4. Map of attributes, here you can extract your two attributes...

    see ContentHandler Implementation for specific deatils.

    bye

    UPDATED: improved prevous snippet.

    0 讨论(0)
  • 2021-01-22 20:07

    Yes, you can override methods that process only the elements you want:

    • http://www.javacommerce.com/displaypage.jsp?name=saxparser1.sql&id=18232
    • http://www.java2s.com/Code/Java/XML/SAXDemo.htm
    0 讨论(0)
  • 2021-01-22 20:07

    SAX doesn't "load" elements. It informs your application of the start and end of each element, and it's entirely up to your application to decide which elements it takes any notice of.

    0 讨论(0)
  • 2021-01-22 20:23

    It is pretty much the same approach as I've answered here already.

    Scroll down to the org.xml.sax Implementation part. You'll only need a custom handler.

    0 讨论(0)
提交回复
热议问题