I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m
Here's an example of using JAXB with StAX.
Input document:
<?xml version="1.0" encoding="UTF-8"?>
<Personlist xmlns="http://example.org">
<Person>
<Name>Name 1</Name>
<Address>
<StreetAddress>Somestreet</StreetAddress>
<PostalCode>00001</PostalCode>
<CountryName>Finland</CountryName>
</Address>
</Person>
<Person>
<Name>Name 2</Name>
<Address>
<StreetAddress>Someotherstreet</StreetAddress>
<PostalCode>43400</PostalCode>
<CountryName>Sweden</CountryName>
</Address>
</Person>
</Personlist>
Person.java:
@XmlRootElement(name = "Person", namespace = "http://example.org")
public class Person {
@XmlElement(name = "Name", namespace = "http://example.org")
private String name;
@XmlElement(name = "Address", namespace = "http://example.org")
private Address address;
public String getName() {
return name;
}
public Address getAddress() {
return address;
}
}
Address.java:
public class Address {
@XmlElement(name = "StreetAddress", namespace = "http://example.org")
private String streetAddress;
@XmlElement(name = "PostalCode", namespace = "http://example.org")
private String postalCode;
@XmlElement(name = "CountryName", namespace = "http://example.org")
private String countryName;
public String getStreetAddress() {
return streetAddress;
}
public String getPostalCode() {
return postalCode;
}
public String getCountryName() {
return countryName;
}
}
PersonlistProcessor.java:
public class PersonlistProcessor {
public static void main(String[] args) throws Exception {
new PersonlistProcessor().processPersonlist(PersonlistProcessor.class
.getResourceAsStream("personlist.xml"));
}
// TODO: Instead of throws Exception, all exceptions should be wrapped
// inside runtime exception
public void processPersonlist(InputStream inputStream) throws Exception {
JAXBContext jaxbContext = JAXBContext.newInstance(Person.class);
XMLStreamReader xss = XMLInputFactory.newFactory().createXMLStreamReader(inputStream);
// Create unmarshaller
Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
// Go to next tag
xss.nextTag();
// Require Personlist
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Personlist");
// Go to next tag
while (xss.nextTag() == XMLStreamReader.START_ELEMENT) {
// Require Person
xss.require(XMLStreamReader.START_ELEMENT, "http://example.org", "Person");
// Unmarshall person
Person person = (Person)unmarshaller.unmarshal(xss);
// Process person
processPerson(person);
}
// Require Personlist
xss.require(XMLStreamReader.END_ELEMENT, "http://example.org", "Personlist");
}
private void processPerson(Person person) {
System.out.println(person.getName());
System.out.println(person.getAddress().getCountryName());
}
}
In SAX the parser "pushes" events at your handler, so you have to do all the housekeeping as you are used to here. An alternative would be StAX (the javax.xml.stream
package), which is still streaming but your code is responsible for "pulling" events from the parser. This way the logic of what elements are expected in what order is encoded in the control flow of your program rather than having to be explicitly represented in booleans.
Depending on the precise structure of the XML there may be a "middle way" using a toolkit like XOM, which has a mode of operation where you parse a subtree of the document into a DOM-like object model, process that twig, then throw it away and parse the next one. This is good for repetitive documents with many similar elements that can each be processed in isolation - you get the ease of programming to a tree-based API within each twig but still have the streaming behaviour that lets you parse huge documents efficiently.
public class ItemProcessor extends NodeFactory {
private Nodes emptyNodes = new Nodes();
public Nodes finishMakingElement(Element elt) {
if("Item".equals(elt.getLocalName())) {
// process the Item element here
System.out.println(elt.getFirstChildElement("ItemId").getValue()
+ ": " + elt.getFirstChildElement("ItemName").getValue());
// then throw it away
return emptyNodes;
} else {
return super.finishMakingElement(elt);
}
}
}
You can achieve a similar thing with a combination of StAX and JAXB - define JAXB annotated classes that represent your repeating element (Item in this example) and then create a StAX parser, navigate to the first Item
start tag, and then you can unmarshal one complete Item
at a time from the XMLStreamReader
.
As others suggested, a Stax model would be a better approach to minimize the memory foot print since it is a push based model. I have personally used Axio (Which is used in Apache Axis) and parse elements using XPath expressions which is less verbose than going through node elements as you have done in the code snippet provided.