Not new to Java; but relatively new to XML-parsing. I know a tiny bit about a lot of the XML tools out there, but not much about any of them. I am also not an XML-pro.
If your XML documents are relatively small (as appears to be the case here), I would use the DOM framework and XPath class. Here is some boilerplate DOM/XPath code from one of my tutorials:
File xmlFile = ...
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(xmlFile);
XPath xp = XPathFactory.newInstance().newXPath();
String value = xp.evaluate("/path/to/element/text()", doc);
// .. reuse xp to get other values as required
In other words, basically you:
get your XML into a Document object, via a DocumentBuilder;
create an XPath object;
repeatedly call XPath.evaluate(), passing in the path of the element(s) required and your Document.
As you see, there's a little bit of fiddliness in getting hold of your Document object and like all good XML APIs, it throws a plethora of silly pointless checked exceptions. But apart from that, it's fairly no-nonsense for parsing simple small to medium XML documents whose structure is relatively fixed.
OK, so I settled on a solution that (to me) seemed to address my needs in the most reasonable way. My apologies to the other suggestions, but I just liked this route better because it kept most of the parsing-rules as annotations and what little procedural-code I had to write was very minimal.
I ended up going with JAXB; initially I thought JAXB would either create XML from a Java-class or parse XML into a Java-class but only with an XSD. Then I discovered that JAXB has annotations that can parse XML into a Java-class without an XSD.
The XML-file I'm working with is huge and very deep, but I only need bits and bites of it here and there; I was worried that navigating what maps to where in the future would be very difficult. So I chose to structure a tree of folders modeled after the XML... each folder maps to an element and in each folder is a POJO representing that actual element.
Problem is, sometimes there is an element who has a child-element several levels down which has a single property I care about. It would be a pain to create 4 nested-folders and a POJO for each just to get access to a single property. But that's how you do it with JAXB (at least, from what I can tell); once again I was in a corner.
Then I stumbled on EclipseLink's JAXB-implementation: Moxy. Moxy has an @XPath annotation that I could place in that parent POJO and use to navigate several levels down to get access to a single property without creating all those folders and element-POJOs. Nice.
So I created something like this: (note: I chose to use getters for cases where I need to massage the value)
// maps to the root-"xml" element in the file
@XmlRootElement( name="xml" )
@XmlAccessorType( XmlAccessType.FIELD )
public class Xml {
// this is standard JAXB
@XmlElement;
private Item item;
public Item getItem() {
return this.item;
}
...
}
// maps to the "<xml><item>"-element in the file
public class Item {
// standard JAXB; maps to "<xml><item id="...">"
@XmlAttribute
private String id;
public String getId() {
return this.id;
}
// getting an attribute buried deep down
// MOXY; maps to "<xml><item><rating average="...">"
@XmlPath( "rating/@average" )
private Double averageRating;
public Double getAverageRating() {
return this.average;
}
// getting a list buried deep down
// MOXY; maps to "<xml><item><service><identification><aliases><alias.../><alias.../>"
@XmlPath( "service/identification/aliases/alias/text()" )
private List<String> aliases;
public List<String> getAliases() {
return this.aliases;
}
// using a getter to massage the value
@XmlElement(name="dateforindex")
private String dateForIndex;
public Date getDateForIndex() {
// logic to parse the string-value into a Date
}
}
Also note that I took the route of separating the XML-object from the model-object I actually use in the app. Thus, I have a factory that transforms these crude objects into much more robust objects which I actually use in my app.
You can use SAXParser or STAXParser. If you can afford some more amount of memory, then you can also afford to use DOMParser. I would advise STAXParser would be best for you.