Speeding up xpath

后端 未结 6 544
广开言路
广开言路 2020-12-24 14:09

I have a 1000 entry document whose format is something like:


     
          
          

        
相关标签:
6条回答
  • 2020-12-24 14:20

    Use the JAXEN library for xpaths: http://jaxen.codehaus.org/

    0 讨论(0)
  • 2020-12-24 14:26

    I had similar issue with the Xpath Evaluation , I tried using CachedXPathAPI’s which is faster by 100X than the XPathApi’s which was used earlier. more information about this Api is provided here : http://xml.apache.org/xalan-j/apidocs/org/apache/xpath/CachedXPathAPI.html

    Hope it helps. Cheers, Madhusudhan

    0 讨论(0)
  • 2020-12-24 14:31

    The correct solution is to detach the node right after you call item(i), like so:

    Node node = results.item(index)
    node.getParentNode().removeChild(node)
    nodes.add(node)
    

    See XPath.evaluate performance slows down (absurdly) over multiple calls

    0 讨论(0)
  • 2020-12-24 14:40

    What kind of parser are you using?

    DOM pulls the whole document in memory - once you pull the whole document in memory then your operations can be fast but doing so in a web app or a for loop can have an impact.

    SAX parser does on demand parsing and loads nodes as and when you request.

    So try to use a parser implementation that suits your need.

    0 讨论(0)
  • 2020-12-24 14:42

    If you need to parse huge but flat documents, SAX is a good alternative. It allows you to handle the XML as a stream instead of building a huge DOM. Your example could be parsed using a ContentHandler like this:

    import org.xml.sax.Attributes;
    import org.xml.sax.SAXException;
    import org.xml.sax.ext.DefaultHandler2;
    
    public class ExampleHandler extends DefaultHandler2 {
    
        private StringBuffer chars = new StringBuffer(1000);
    
        private MyEntry currentEntry;
        private MyEntryHandler myEntryHandler;
    
        ExampleHandler(MyEntryHandler myEntryHandler) {
            this.myEntryHandler = myEntryHandler;
        }
    
        @Override
        public void characters(char[] ch, int start, int length)
                throws SAXException {
            chars.append(ch);
        }
    
        @Override
        public void endElement(String uri, String localName, String qName)
                throws SAXException {
            if ("Entry".equals(localName)) {
                myEntryHandler.handle(currentEntry);
                currentEntry = null;
            }
            else if ("n1".equals(localName)) {
                currentEntry.setN1(chars.toString());
            }
            else if ("n2".equals(localName)) {
                currentEntry.setN2(chars.toString());
            }
        }
    
    
        @Override
        public void startElement(String uri, String localName, String qName,
                Attributes atts) throws SAXException {
            chars.setLength(0);
            if ("Entry".equals(localName)) {
                currentEntry = new MyEntry();
            }
        }
    }
    

    If the document has a deeper and more complex structure, you're going to need to use Stacks to keep track of the current path in the document. Then you should consider writing a general purpose ContentHandler to do the dirty work and use with your document type dependent handlers.

    0 讨论(0)
  • 2020-12-24 14:45

    Try VTD-XML. It uses less memory than DOM. It is easier to use than SAX and supports XPath. Here is some sample code to help you get started. It applies an XPath to get the Entry elements and then prints out the n1 and n2 child elements.

    final VTDGen vg = new VTDGen();
    vg.parseFile("/path/to/file.xml", false);
    
    final VTDNav vn = vg.getNav();
    final AutoPilot ap = new AutoPilot(vn);
    ap.selectXPath("/Example/Entry");
    int count = 1;
    while (ap.evalXPath() != -1) {
        System.out.println("Inside Entry: " + count);
    
        //move to n1 child
        vn.toElement(VTDNav.FIRST_CHILD, "n1");
        System.out.println("\tn1: " + vn.toNormalizedString(vn.getText()));
    
        //move to n2 child
        vn.toElement(VTDNav.NEXT_SIBLING, "n2");
        System.out.println("\tn2: " + vn.toNormalizedString(vn.getText()));
    
        //move back to parent
        vn.toElement(VTDNav.PARENT);
        count++;
    }
    
    0 讨论(0)
提交回复
热议问题