Better way to parse xml

前端 未结 9 2111
感动是毒
感动是毒 2021-02-06 05:05

I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m

9条回答
  •  南方客
    南方客 (楼主)
    2021-02-06 05:55

    In SAX the parser "pushes" events at your handler, so you have to do all the housekeeping as you are used to here. An alternative would be StAX (the javax.xml.stream package), which is still streaming but your code is responsible for "pulling" events from the parser. This way the logic of what elements are expected in what order is encoded in the control flow of your program rather than having to be explicitly represented in booleans.

    Depending on the precise structure of the XML there may be a "middle way" using a toolkit like XOM, which has a mode of operation where you parse a subtree of the document into a DOM-like object model, process that twig, then throw it away and parse the next one. This is good for repetitive documents with many similar elements that can each be processed in isolation - you get the ease of programming to a tree-based API within each twig but still have the streaming behaviour that lets you parse huge documents efficiently.

    public class ItemProcessor extends NodeFactory {
      private Nodes emptyNodes = new Nodes();
    
      public Nodes finishMakingElement(Element elt) {
        if("Item".equals(elt.getLocalName())) {
          // process the Item element here
          System.out.println(elt.getFirstChildElement("ItemId").getValue()
             + ": " + elt.getFirstChildElement("ItemName").getValue());
    
          // then throw it away
          return emptyNodes;
        } else {
          return super.finishMakingElement(elt);
        }
      }
    }
    

    You can achieve a similar thing with a combination of StAX and JAXB - define JAXB annotated classes that represent your repeating element (Item in this example) and then create a StAX parser, navigate to the first Item start tag, and then you can unmarshal one complete Item at a time from the XMLStreamReader.

提交回复
热议问题