Better way to parse xml

前端 未结 9 2122
感动是毒
感动是毒 2021-02-06 05:05

I\'ve been parsing XML like this for years, and I have to admit when the number of different element becomes larger I find it a bit boring and exhausting to do, here is what I m

相关标签:
9条回答
  • 2021-02-06 05:37

    I've been using this library. It sits on top of the standard Java library and makes things easier for me. In particular, you can ask for a specific element or attribute by name, rather than using the big "if" statement you've described.

    http://marketmovers.blogspot.com/2014/02/the-easy-way-to-read-xml-in-java.html

    0 讨论(0)
  • 2021-02-06 05:38
        import java.io.File;
    import java.io.FileOutputStream;
    import java.io.InputStream;
    import java.io.OutputStream;
    import java.util.ArrayList;
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.transform.Transformer;
    import javax.xml.transform.TransformerFactory;
    import javax.xml.transform.dom.DOMSource;
    import javax.xml.transform.stream.StreamResult;
    import javax.xml.xpath.XPath;
    import javax.xml.xpath.XPathConstants;
    import javax.xml.xpath.XPathExpression;
    import javax.xml.xpath.XPathFactory;
    import org.w3c.dom.Document;
    import org.w3c.dom.NodeList;
    
    public class JXML {
    private DocumentBuilder builder;
    private Document doc = null;
    private DocumentBuilderFactory factory ;
    private XPathExpression expr = null;
    private XPathFactory xFactory;
    private XPath xpath;
    private String xmlFile;
    public static ArrayList<String> XMLVALUE ;  
    
    
    public JXML(String xmlFile){
        this.xmlFile = xmlFile;
    }
    
    
    private void xmlFileSettings(){     
        try {
            factory = DocumentBuilderFactory.newInstance();
            factory.setNamespaceAware(true);
            xFactory = XPathFactory.newInstance();
            xpath = xFactory.newXPath();
            builder = factory.newDocumentBuilder();
            doc = builder.parse(xmlFile);
        }
        catch (Exception e){
            System.out.println(e);
        }       
    }
    
    
    
    public String[] selectQuery(String query){
        xmlFileSettings();
        ArrayList<String> records = new ArrayList<String>();
        try {
            expr = xpath.compile(query);
            Object result = expr.evaluate(doc, XPathConstants.NODESET);
            NodeList nodes = (NodeList) result;
            for (int i=0; i<nodes.getLength();i++){             
                records.add(nodes.item(i).getNodeValue());
            }
            return records.toArray(new String[records.size()]);
        } 
        catch (Exception e) {
            System.out.println("There is error in query string");
            return records.toArray(new String[records.size()]);
        }       
    }
    
    public boolean updateQuery(String query,String value){
        xmlFileSettings();
        try{
            NodeList nodes = (NodeList) xpath.evaluate(query, doc, XPathConstants.NODESET);
            for (int idx = 0; idx < nodes.getLength(); idx++) {
              nodes.item(idx).setTextContent(value);
            }
            Transformer xformer = TransformerFactory.newInstance().newTransformer();
            xformer.transform(new DOMSource(doc), new StreamResult(new File(this.xmlFile)));
            return true;
        }catch(Exception e){
            System.out.println(e);
            return false;
        }
    }
    
    
    
    
    public static void main(String args[]){
        JXML jxml = new JXML("c://user.xml");
        jxml.updateQuery("//Order/CustomerId/text()","222");
        String result[]=jxml.selectQuery("//Order/Item/*/text()");
        for(int i=0;i<result.length;i++){
            System.out.println(result[i]);
        }
    }
    

    }

    0 讨论(0)
  • 2021-02-06 05:39

    I've been using xsteam to serialize my own objects to xml and then load them back as Java objects. If you can represent everythign as POJOs and you properly annotate the POJOs to match the types in your xml file you might find it much easier to use.

    When a String represents an object in XML, you can just write:

    Order theOrder = (Order)xstream.fromXML(xmlString);

    I have always used it to load an object into memory in a single line, but if you need to stream it and process as you go you should be able to use a HierarchicalStreamReader to iterate through the document. This might be very similar to Simple, suggested by @Dave.

    0 讨论(0)
  • 2021-02-06 05:41

    There is another library which supports more compact XML parsing, RTXML. The library and its documentation is on rasmustorkel.com. I implemented the parsing of the file in the original question and I am including the complete program here:

    package for_so;
    
    import java.io.File;
    import java.util.ArrayList;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;
    
    import rasmus_torkel.xml_basic.read.TagNode;
    import rasmus_torkel.xml_basic.read.XmlReadOptions;
    import rasmus_torkel.xml_basic.read.impl.XmlReader;
    
    public class Q15626686_ReadOrder
    {
        public static class Order
        {
            public final Date            _date;
            public final int             _customerId;
            public final String          _customerName;
            public final ArrayList<Item> _itemAl;
    
            public
            Order(TagNode node)
            {
                _date = (Date)node.nextStringMappedFieldE("Date", Date.class);
                _customerId = (int)node.nextIntFieldE("CustomerId");
                _customerName = node.nextTextFieldE("CustomerName");
                _itemAl = new ArrayList<Item>();
                boolean finished = false;
                while (!finished)
                {
                    TagNode itemNode = node.nextChildN("Item");
                    if (itemNode != null)
                    {
                        Item item = new Item(itemNode);
                        _itemAl.add(item);
                    }
                    else
                    {
                        finished = true;
                    }
                }
                node.verifyNoMoreChildren();
            }
        }
    
        public static final Pattern DATE_PATTERN = Pattern.compile("^(\\d\\d\\d\\d)\\/(\\d\\d)\\/(\\d\\d)$");
    
        public static class Date
        {
            public final String _dateString;
            public final int    _year;
            public final int    _month;
            public final int    _day;
    
            public
            Date(String dateString)
            {
                _dateString = dateString;
                Matcher matcher = DATE_PATTERN.matcher(dateString);
                if (!matcher.matches())
                {
                    throw new RuntimeException(dateString + " does not match pattern " + DATE_PATTERN.pattern());
                }
                _year = Integer.parseInt(matcher.group(1));
                _month = Integer.parseInt(matcher.group(2));
                _day = Integer.parseInt(matcher.group(3));
            }
        }
    
        public static class Item
        {
            public final int      _itemId;
            public final String   _itemName;
            public final Quantity _quantity;
    
            public
            Item(TagNode node)
            {
                _itemId = node.nextIntFieldE("ItemId");
                _itemName = node.nextTextFieldE("ItemName");
                _quantity = new Quantity(node.nextChildE("Quantity"));
                node.verifyNoMoreChildren();
            }
        }
    
        public static class Quantity
        {
            public final int _unitSize;
            public final int _unitQuantity;
    
            public
            Quantity(TagNode node)
            {
                _unitSize = node.attributeIntD("unit", 1);
                _unitQuantity = node.onlyInt();
            }
        }
    
        public static void
        main(String[] args)
        {
            File xmlFile = new File(args[0]);
            TagNode orderNode = XmlReader.xmlFileToRoot(xmlFile, "Order", XmlReadOptions.DEFAULT);
            Order order = new Order(orderNode);
            System.out.println("Read order for " + order._customerName + " which has " + order._itemAl.size() + " items");
        }
    }
    

    You will notice that the retrieval functions end in N, E or D. They refer to what to do when the desired data item is not there. N stands for return Null, E stands for throw Exception and D stands for use Default.

    0 讨论(0)
  • 2021-02-06 05:50

    If you control the definition of the XML, you could use an XML binding tool, for example JAXB (Java Architecture for XML Binding.) In JAXB you can define a schema for the XML structure (XSD and others are supported) or annotate your Java classes in order to define the serialization rules. Once you have a clear declarative mapping between XML and Java, marshalling and unmarshalling to/from XML becomes trivial.

    Using JAXB does require more memory than SAX handlers, but there exist methods to process the XML documents by parts: Dealing with large documents.

    JAXB page from Oracle

    0 讨论(0)
  • 2021-02-06 05:50

    Solution without using outside package, or even XPath: use an enum "PARSE_MODE", probably in combination with a Stack<PARSE_MODE>:

    1) The basic solution:

    a) fields

    private PARSE_MODE parseMode = PARSE_MODE.__UNDEFINED__;
    // NB: essential that all these enum values are upper case, but this is the convention anyway
    private enum PARSE_MODE {
        __UNDEFINED__, ORDER, DATE, CUSTOMERID, ITEM };
    private List<String> parseModeStrings = new ArrayList<String>();
    private Stack<PARSE_MODE> modeBreadcrumbs = new Stack<PARSE_MODE>();
    

    b) make your List<String>, maybe in the constructor:

        for( PARSE_MODE pm : PARSE_MODE.values() ){
            // might want to check here that these are indeed upper case
            parseModeStrings.add( pm.name() );
        }
    

    c) startElement and endElement:

    @Override
    public void startElement(String namespaceURI, String localName, String qName, Attributes atts) {
        String localNameUC = localName.toUpperCase();
        // pushing "__UNDEFINED__" would mess things up! But unlikely name for an XML element
        assert ! localNameUC.equals( "__UNDEFINED__" );
    
        if( parseModeStrings.contains( localNameUC )){
            parseMode = PARSE_MODE.valueOf( localNameUC );
            // any "policing" to do with which modes are allowed to switch into 
            // other modes could be put here... 
            // in your case, go `new Order()` here when parseMode == ORDER
            modeBreadcrumbs.push( parseMode );
        } 
        else {
           // typically ignore the start of this element...
        }
    }   
    
    @Override
    private void endElement(String uri, String localName, String qName) throws Exception {
        String localNameUC = localName.toUpperCase();
        if( parseModeStrings.contains( localNameUC )){
            // will not fail unless XML structure which is malformed in some way
            // or coding error in use of the Stack, etc.:
            assert modeBreadcrumbs.pop() == parseMode;
            if( modeBreadcrumbs.empty() ){
                parseMode = PARSE_MODE.__UNDEFINED__;
            }
            else {
                parseMode = modeBreadcrumbs.peek();
            }
        } 
        else {
           // typically ignore the end of this element...
        }
    
    }
    

    ... so what does this all mean? At any one time you have knowledge of the "parse mode" you're in ... and you can also look at the Stack<PARSE_MODE> modeBreadcrumbs if you need to find out what other parse modes you passed through to get here...

    Your characters method then becomes substantially cleaner:

    public void characters(char[] ch, int start, int length) throws SAXException {
        switch( parseMode ){
        case DATE:
            // PS - this SimpleDateFormat object can be a field: it doesn't need to be created hundreds of times
            SimpleDateFormat formatter. ...
            String value = ...
            ...
            break;
    
        case CUSTOMERID:
            order.setCustomerId( ...
            break;
    
        case ITEM:
            item = new Item();
            // this next line probably won't be needed: when you get to endElement, if 
            // parseMode is ITEM, the previous mode will be restored automatically
            // isItem = false ;
        }
    
    }
    

    2) The more "professional" solution:
    abstract class which concrete classes have to extend and which then have no ability to modify the Stack, etc. NB this examines qName rather than localName. Thus:

    public abstract class AbstractSAXHandler extends DefaultHandler {
        protected enum PARSE_MODE implements SAXHandlerParseMode {
            __UNDEFINED__
        };
        // abstract: the concrete subclasses must populate...
        abstract protected Collection<Enum<?>> getPossibleModes();
        // 
        private Stack<SAXHandlerParseMode> modeBreadcrumbs = new Stack<SAXHandlerParseMode>();
        private Collection<Enum<?>> possibleModes;
        private Map<String, Enum<?>> nameToEnumMap;
        private Map<String, Enum<?>> getNameToEnumMap(){
            // lazy creation and population of map
            if( nameToEnumMap == null ){
                if( possibleModes == null ){
                    possibleModes = getPossibleModes();
                }
                nameToEnumMap = new HashMap<String, Enum<?>>();
                for( Enum<?> possibleMode : possibleModes ){
                    nameToEnumMap.put( possibleMode.name(), possibleMode ); 
                }
            }
            return nameToEnumMap;
        }
    
        protected boolean isLegitimateModeName( String name ){
            return getNameToEnumMap().containsKey( name );
        }
    
        protected SAXHandlerParseMode getParseMode() {
            return modeBreadcrumbs.isEmpty()? PARSE_MODE.__UNDEFINED__ : modeBreadcrumbs.peek();
        }
    
        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes)
                throws SAXException {
            try {
                _startElement(uri, localName, qName, attributes);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    
        // override in subclasses (NB I think caught Exceptions are not a brilliant design choice in Java)
        protected void _startElement(String uri, String localName, String qName, Attributes attributes)
                throws Exception {
            String qNameUC = qName.toUpperCase();
            // very undesirable ever to push "UNDEFINED"! But unlikely name for an XML element
            assert !qNameUC.equals("__UNDEFINED__") : "Encountered XML element with qName \"__UNDEFINED__\"!";
            if( getNameToEnumMap().containsKey( qNameUC )){
                Enum<?> newMode = getNameToEnumMap().get( qNameUC );
                modeBreadcrumbs.push( (SAXHandlerParseMode)newMode );
            }
        }
    
        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            try {
                _endElement(uri, localName, qName);
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    
        // override in subclasses
        protected void _endElement(String uri, String localName, String qName) throws Exception {
            String qNameUC = qName.toUpperCase();
            if( getNameToEnumMap().containsKey( qNameUC )){
                modeBreadcrumbs.pop(); 
            }
        }
    
        public List<?> showModeBreadcrumbs(){
            return org.apache.commons.collections4.ListUtils.unmodifiableList( modeBreadcrumbs );
        }
    
    }
    
    interface SAXHandlerParseMode {
    
    }
    

    Then, salient part of concrete subclass:

    private enum PARSE_MODE implements SAXHandlerParseMode {
        ORDER, DATE, CUSTOMERID, ITEM
    };
    
    private Collection<Enum<?>> possibleModes;
    
    @Override
    protected Collection<Enum<?>> getPossibleModes() {
        // lazy initiation
        if (possibleModes == null) {
            List<SAXHandlerParseMode> parseModes = new ArrayList<SAXHandlerParseMode>( Arrays.asList(PARSE_MODE.values()) );
            possibleModes = new ArrayList<Enum<?>>();
            for( SAXHandlerParseMode parseMode : parseModes ){
                possibleModes.add( PARSE_MODE.valueOf( parseMode.toString() ));
            }
            // __UNDEFINED__ mode (from abstract superclass) must be added afterwards
            possibleModes.add( AbstractSAXHandler.PARSE_MODE.__UNDEFINED__ );
        }
        return possibleModes;
    }
    

    PS this is a starting point for more sophisticated stuff: for example, you might set up a List<Object> which is kept synchronised with the Stack<PARSE_MODE>: the Objects could then be anything you want, enabling you to "reach back" into the ascendant "XML nodes" of the one you're dealing with. Don't use a Map, though: the Stack can potentially contain the same PARSE_MODE object more than once. This in fact illustrates a fundamental characteristic of all tree-like structures: no individual node (here: parse mode) exists in isolation: its identity is always defined by the entire path leading to it.

    0 讨论(0)
提交回复
热议问题