Better way to parse xml

前端 未结 9 2124
感动是毒
感动是毒 2021-02-06 05:05

I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m

相关标签:
9条回答
  • 2021-02-06 05:54

    Here's an example of using JAXB with StAX.

    Input document:

    <?xml version="1.0" encoding="UTF-8"?>
    <Personlist xmlns="http://example.org">
        <Person>
            <Name>Name 1</Name>
            <Address>
                <StreetAddress>Somestreet</StreetAddress>
                <PostalCode>00001</PostalCode>
                <CountryName>Finland</CountryName>
            </Address>
        </Person>
        <Person>
            <Name>Name 2</Name>
            <Address>
                <StreetAddress>Someotherstreet</StreetAddress>
                <PostalCode>43400</PostalCode>
                <CountryName>Sweden</CountryName>
            </Address>
        </Person>
    </Personlist>
    

    Person.java:

    @XmlRootElement(name = "Person", namespace = "http://example.org")
    public class Person {
        @XmlElement(name = "Name", namespace = "http://example.org")
        private String name;
        @XmlElement(name = "Address", namespace = "http://example.org")
        private Address address;
    
        public String getName() {
            return name;
        }
    
        public Address getAddress() {
            return address;
        }
    }
    

    Address.java:

    public class Address {
        @XmlElement(name = "StreetAddress", namespace = "http://example.org")
        private String streetAddress;
        @XmlElement(name = "PostalCode", namespace = "http://example.org")
        private String postalCode;
        @XmlElement(name = "CountryName", namespace = "http://example.org")
        private String countryName;
    
        public String getStreetAddress() {
            return streetAddress;
        }
    
        public String getPostalCode() {
            return postalCode;
        }
    
        public String getCountryName() {
            return countryName;
        }
    }
    

    PersonlistProcessor.java:

    public class PersonlistProcessor {
        public static void main(String[] args) throws Exception {
            new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
                    .getResourceAsStream("personlist.xml"));
        }
    
        // TODO: Instead of throws Exception, all exceptions should be wrapped
        // inside runtime exception
        public void processPersonlist(InputStream inputStream) throws Exception {
            JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
            XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
            // Create unmarshaller
            Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
            // Go to next tag
            xss.nextTag();
            // Require Personlist
            xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
            // Go to next tag
            while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
                // Require Person
                xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
                // Unmarshall person
                Person person = (Person)unmarshaller.unmarshal(xss);
                // Process person
                processPerson(person);
            }
            // Require Personlist
            xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
        }
    
        private void processPerson(Person person) {
            System.out.println(person.getName());
            System.out.println(person.getAddress().getCountryName());
        }
    }
    
    0 讨论(0)
  • 2021-02-06 05:55

    In SAX the parser "pushes" events at your handler, so you have to do all the housekeeping as you are used to here. An alternative would be StAX (the javax.xml.stream package), which is still streaming but your code is responsible for "pulling" events from the parser. This way the logic of what elements are expected in what order is encoded in the control flow of your program rather than having to be explicitly represented in booleans.

    Depending on the precise structure of the XML there may be a "middle way" using a toolkit like XOM, which has a mode of operation where you parse a subtree of the document into a DOM-like object model, process that twig, then throw it away and parse the next one. This is good for repetitive documents with many similar elements that can each be processed in isolation - you get the ease of programming to a tree-based API within each twig but still have the streaming behaviour that lets you parse huge documents efficiently.

    public class ItemProcessor extends NodeFactory {
      private Nodes emptyNodes = new Nodes();
    
      public Nodes finishMakingElement(Element elt) {
        if("Item".equals(elt.getLocalName())) {
          // process the Item element here
          System.out.println(elt.getFirstChildElement("ItemId").getValue()
             + ": " + elt.getFirstChildElement("ItemName").getValue());
    
          // then throw it away
          return emptyNodes;
        } else {
          return super.finishMakingElement(elt);
        }
      }
    }
    

    You can achieve a similar thing with a combination of StAX and JAXB - define JAXB annotated classes that represent your repeating element (Item in this example) and then create a StAX parser, navigate to the first Item start tag, and then you can unmarshal one complete Item at a time from the XMLStreamReader.

    0 讨论(0)
  • 2021-02-06 05:58

    As others suggested, a Stax model would be a better approach to minimize the memory foot print since it is a push based model. I have personally used Axio (Which is used in Apache Axis) and parse elements using XPath expressions which is less verbose than going through node elements as you have done in the code snippet provided.

    0 讨论(0)
提交回复
热议问题