Better way to parse xml

前端 未结 9 2110
感动是毒
感动是毒 2021-02-06 05:05

I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m

9条回答
  •  粉色の甜心
    2021-02-06 05:50

    Solution without using outside package, or even XPath: use an enum "PARSE_MODE", probably in combination with a Stack:

    1) The basic solution:

    a) fields

    private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
    // NB: essential that all these enum values are upper case, but this is the convention anyway
    private enum PARSE_MODE {
        __UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
    private List parseModeStrings = new ArrayList();
    private Stack modeBreadcrumbs = new Stack();
    

    b) make your List, maybe in the constructor:

        for( PARSE_MODE pm : PARSE_MODE.values() ){
            // might want to check here that these are indeed upper case
            parseModeStrings.add( pm.name() );
        }
    

    c) startElement and endElement:

    @Override
    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
        String localNameUC = localName.toUpperCase();
        // pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
        assert ! localNameUC.equals( "__UNDEFINED__" );
    
        if( parseModeStrings.contains( localNameUC )){
            parseMode = PARSE_MODE.valueOf( localNameUC );
            // any "policing" to do with which modes are allowed to switch into 
            // other modes could be put here... 
            // in your case, go `new Order()` here when parseMode == ORDER
            modeBreadcrumbs.push( parseMode );
        } 
        else {
           // typically ignore the start of this element...
        }
    }   
    
    @Override
    private void endElement(String uri, String localName, String qName) throws Exception {
        String localNameUC = localName.toUpperCase();
        if( parseModeStrings.contains( localNameUC )){
            // will not fail unless XML structure which is malformed in some way
            // or coding error in use of the Stack, etc.:
            assert modeBreadcrumbs.pop() == parseMode;
            if( modeBreadcrumbs.empty() ){
                parseMode = PARSE_MODE.__UNDEFINED__;
            }
            else {
                parseMode = modeBreadcrumbs.peek();
            }
        } 
        else {
           // typically ignore the end of this element...
        }
    
    }
    

    ... so what does this all mean? At any one time you have knowledge of the "parse mode" you're in ... and you can also look at the Stack modeBreadcrumbs if you need to find out what other parse modes you passed through to get here...

    Your characters method then becomes substantially cleaner:

    public void characters(char[] ch, int start, int length) throws SAXException {
        switch( parseMode ){
        case DATE:
            // PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
            SimpleDateFormat formatter. ...
            String value = ...
            ...
            break;
    
        case CUSTOMERID:
            order.setCustomerId( ...
            break;
    
        case ITEM:
            item = new Item();
            // this next line probably won't be needed: when you get to endElement, if 
            // parseMode is ITEM, the previous mode will be restored automatically
            // isItem = false ;
        }
    
    }
    

    2) The more "professional" solution:
    abstract class which concrete classes have to extend and which then have no ability to modify the Stack, etc. NB this examines qName rather than localName. Thus:

    public abstract class AbstractSAXHandler extends DefaultHandler {
        protected enum PARSE_MODE implements SAXHandlerParseMode {
            __UNDEFINED__
        };
        // abstract: the concrete subclasses must populate...
        abstract protected Collection> getPossibleModes();
        // 
        private Stack modeBreadcrumbs = new Stack();
        private Collection> possibleModes;
        private Map> nameToEnumMap;
        private Map> getNameToEnumMap(){
            // lazy creation and population of map
            if( nameToEnumMap == null ){
                if( possibleModes == null ){
                    possibleModes = getPossibleModes();
                }
                nameToEnumMap = new HashMap>();
                for( Enum possibleMode : possibleModes ){
                    nameToEnumMap.put( possibleMode.name(), possibleMode ); 
                }
            }
            return nameToEnumMap;
        }
    
        protected boolean isLegitimateModeName( String name ){
            return getNameToEnumMap().containsKey( name );
        }
    
        protected SAXHandlerParseMode getParseMode() {
            return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
        }
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes)
                throws SAXException {
            try {
                _startElement(uri, localName, qName, attributes);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    
        // override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
        protected void _startElement(String uri, String localName, String qName, Attributes attributes)
                throws Exception {
            String qNameUC = qName.toUpperCase();
            // very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
            assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
            if( getNameToEnumMap().containsKey( qNameUC )){
                Enum newMode = getNameToEnumMap().get( qNameUC );
                modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
            }
        }
    
        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            try {
                _endElement(uri, localName, qName);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    
        // override in subclasses
        protected void _endElement(String uri, String localName, String qName) throws Exception {
            String qNameUC = qName.toUpperCase();
            if( getNameToEnumMap().containsKey( qNameUC )){
                modeBreadcrumbs.pop(); 
            }
        }
    
        public List showModeBreadcrumbs(){
            return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
        }
    
    }
    
    interface SAXHandlerParseMode {
    
    }
    

    Then, salient part of concrete subclass:

    private enum PARSE_MODE implements SAXHandlerParseMode {
        ORDER, DATE, CUSTOMERID, ITEM
    };
    
    private Collection> possibleModes;
    
    @Override
    protected Collection> getPossibleModes() {
        // lazy initiation
        if (possibleModes == null) {
            List parseModes = new ArrayList( Arrays.asList(PARSE_MODE.values()) );
            possibleModes = new ArrayList>();
            for( SAXHandlerParseMode parseMode : parseModes ){
                possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
            }
            // __UNDEFINED__ mode (from abstract superclass) must be added afterwards
            possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
        }
        return possibleModes;
    }
    

    PS this is a starting point for more sophisticated stuff: for example, you might set up a List which is kept synchronised with the Stack: the Objects could then be anything you want, enabling you to "reach back" into the ascendant "XML nodes" of the one you're dealing with. Don't use a Map, though: the Stack can potentially contain the same PARSE_MODE object more than once. This in fact illustrates a fundamental characteristic of all tree-like structures: no individual node (here: parse mode) exists in isolation: its identity is always defined by the entire path leading to it.

    提交回复
    热议问题