I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m
Solution without using outside package, or even XPath: use an enum
"PARSE_MODE", probably in combination with a Stack
:
1) The basic solution:
a) fields
private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
// NB: essential that all these enum values are upper case, but this is the convention anyway
private enum PARSE_MODE {
__UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
private List parseModeStrings = new ArrayList();
private Stack modeBreadcrumbs = new Stack();
b) make your List
, maybe in the constructor:
for( PARSE_MODE pm : PARSE_MODE.values() ){
// might want to check here that these are indeed upper case
parseModeStrings.add( pm.name() );
}
c) startElement
and endElement
:
@Override
public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
String localNameUC = localName.toUpperCase();
// pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
assert ! localNameUC.equals( "__UNDEFINED__" );
if( parseModeStrings.contains( localNameUC )){
parseMode = PARSE_MODE.valueOf( localNameUC );
// any "policing" to do with which modes are allowed to switch into
// other modes could be put here...
// in your case, go `new Order()` here when parseMode == ORDER
modeBreadcrumbs.push( parseMode );
}
else {
// typically ignore the start of this element...
}
}
@Override
private void endElement(String uri, String localName, String qName) throws Exception {
String localNameUC = localName.toUpperCase();
if( parseModeStrings.contains( localNameUC )){
// will not fail unless XML structure which is malformed in some way
// or coding error in use of the Stack, etc.:
assert modeBreadcrumbs.pop() == parseMode;
if( modeBreadcrumbs.empty() ){
parseMode = PARSE_MODE.__UNDEFINED__;
}
else {
parseMode = modeBreadcrumbs.peek();
}
}
else {
// typically ignore the end of this element...
}
}
... so what does this all mean? At any one time you have knowledge of the "parse mode" you're in ... and you can also look at the Stack
if you need to find out what other parse modes you passed through to get here...
Your characters
method then becomes substantially cleaner:
public void characters(char[] ch, int start, int length) throws SAXException {
switch( parseMode ){
case DATE:
// PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
SimpleDateFormat formatter. ...
String value = ...
...
break;
case CUSTOMERID:
order.setCustomerId( ...
break;
case ITEM:
item = new Item();
// this next line probably won't be needed: when you get to endElement, if
// parseMode is ITEM, the previous mode will be restored automatically
// isItem = false ;
}
}
2) The more "professional" solution:
abstract
class which concrete classes have to extend and which then have no ability to modify the Stack
, etc. NB this examines qName
rather than localName
. Thus:
public abstract class AbstractSAXHandler extends DefaultHandler {
protected enum PARSE_MODE implements SAXHandlerParseMode {
__UNDEFINED__
};
// abstract: the concrete subclasses must populate...
abstract protected Collection> getPossibleModes();
//
private Stack modeBreadcrumbs = new Stack();
private Collection> possibleModes;
private Map> nameToEnumMap;
private Map> getNameToEnumMap(){
// lazy creation and population of map
if( nameToEnumMap == null ){
if( possibleModes == null ){
possibleModes = getPossibleModes();
}
nameToEnumMap = new HashMap>();
for( Enum> possibleMode : possibleModes ){
nameToEnumMap.put( possibleMode.name(), possibleMode );
}
}
return nameToEnumMap;
}
protected boolean isLegitimateModeName( String name ){
return getNameToEnumMap().containsKey( name );
}
protected SAXHandlerParseMode getParseMode() {
return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
}
@Override
public void startElement(String uri, String localName, String qName, Attributes attributes)
throws SAXException {
try {
_startElement(uri, localName, qName, attributes);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
protected void _startElement(String uri, String localName, String qName, Attributes attributes)
throws Exception {
String qNameUC = qName.toUpperCase();
// very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
if( getNameToEnumMap().containsKey( qNameUC )){
Enum> newMode = getNameToEnumMap().get( qNameUC );
modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
}
}
@Override
public void endElement(String uri, String localName, String qName) throws SAXException {
try {
_endElement(uri, localName, qName);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
// override in subclasses
protected void _endElement(String uri, String localName, String qName) throws Exception {
String qNameUC = qName.toUpperCase();
if( getNameToEnumMap().containsKey( qNameUC )){
modeBreadcrumbs.pop();
}
}
public List> showModeBreadcrumbs(){
return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
}
}
interface SAXHandlerParseMode {
}
Then, salient part of concrete subclass:
private enum PARSE_MODE implements SAXHandlerParseMode {
ORDER, DATE, CUSTOMERID, ITEM
};
private Collection> possibleModes;
@Override
protected Collection> getPossibleModes() {
// lazy initiation
if (possibleModes == null) {
List parseModes = new ArrayList( Arrays.asList(PARSE_MODE.values()) );
possibleModes = new ArrayList>();
for( SAXHandlerParseMode parseMode : parseModes ){
possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
}
// __UNDEFINED__ mode (from abstract superclass) must be added afterwards
possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
}
return possibleModes;
}
PS this is a starting point for more sophisticated stuff: for example, you might set up a List
which is kept synchronised with the Stack
: the Objects
could then be anything you want, enabling you to "reach back" into the ascendant "XML nodes" of the one you're dealing with. Don't use a Map
, though: the Stack
can potentially contain the same PARSE_MODE
object more than once. This in fact illustrates a fundamental characteristic of all tree-like structures: no individual node (here: parse mode) exists in isolation: its identity is always defined by the entire path leading to it.