Using Java to parse XML

后端 未结 7 734
忘了有多久
忘了有多久 2020-12-21 07:23

I have made a PHP script that parses an XML file. This is not easy to use and I wanted to implement it in Java.

Inside the first element there are various count of

相关标签:
7条回答
  • 2020-12-21 07:36

    DOM Parser through Recursion

    Using a DOM parser, you can easily get into a mess of nested for loops as you've already pointed out. Nevertheless, DOM structure is represented by Node containing child nodes collection in the form of a NodeList where each element is again a Node - this becomes a perfect candidate for recursion.

    Sample XML

    To showcase the ability of DOM parser discounting the size of the XML, I took the example of a hosted sample OpenWeatherMap XML.

    Searching by city name in XML format

    This XML contains London's weather forecast for every 3 hour duration. This XML makes a good case of reading through a relatively large data set and extracting specific information through attributes within the child elements.

    enter image description here

    In the snapshot, we are targeting to gather the Elements marked by the arrows.

    The Code

    We start of by creating a Custom class to hold temperature and clouds values. We would also override toString() of this custom class to conveniently print our records.

    ForeCast.java

    public class ForeCast {
    
        /**
         * Overridden toString() to conveniently print the results
         */
        @Override
        public String toString() {
            return "The minimum temperature is: " + getTemperature()
                    + " and the weather overall: " + getClouds();
        }
    
        public String getTemperature() {
            return temperature;
        }
    
        public void setTemperature(String temperature) {
            this.temperature = temperature;
        }
    
        public String getClouds() {
            return clouds;
        }
    
        public void setClouds(String clouds) {
            this.clouds = clouds;
        }
    
        private String temperature;
        private String clouds;
    }
    

    Now to the main class. In the main class where we perform our recursion, we want to create a List of ForeCast objects which store individual temperature and clouds records by traversing the entire XML.

    // List collection which is would hold all the data parsed through the XML
    // in the format defined by the custom type 'ForeCast'
    private static List<ForeCast> forecastList = new ArrayList<>();
    

    In the XML the parent to both temperature and clouds elements is time, we would logically check for the time element.

    /**
     * Logical block
     */
    // As per the XML syntax our 2 fields temperature and clouds come
    // directly under the Node/Element time
    if (node.getNodeName().equals("time")
            && node.getNodeType() == Node.ELEMENT_NODE) {
        // Instantiate our custom forecast object
        forecastObj = new ForeCast();
        Element timeElement = (Element) node;
    

    Thereafter, we would get a handle on temperature and clouds elements which can be set to the ForeCast object.

        // Get the temperature element by its tag name within the XML (0th
        // index known)
        Element tempElement = (Element) timeElement.getElementsByTagName("temperature").item(0);
        // Minimum temperature value is selectively picked (for proof of concept)
        forecastObj.setTemperature(tempElement.getAttribute("min"));
    
        // Similarly get the clouds element
        Element cloudElement = (Element) timeElement.getElementsByTagName("clouds").item(0);
        forecastObj.setClouds(cloudElement.getAttribute("value"));
    

    The complete class below:

    CustomDomXmlParser.java

    import java.io.IOException;
    import java.io.InputStream;
    import java.net.URL;
    import java.util.ArrayList;
    import java.util.List;
    
    import javax.xml.parsers.DocumentBuilder;
    import javax.xml.parsers.DocumentBuilderFactory;
    import javax.xml.parsers.ParserConfigurationException;
    
    import org.w3c.dom.Document;
    import org.w3c.dom.Element;
    import org.w3c.dom.Node;
    import org.w3c.dom.NodeList;
    import org.xml.sax.SAXException;
    
    public class CustomDomXmlParser {
    
        // List collection which is would hold all the data parsed through the XML
        // in the format defined by the custom type 'ForeCast'
        private static List<ForeCast> forecastList = new ArrayList<>();
    
        public static void main(String[] args) throws ParserConfigurationException,
                SAXException, IOException {
            // Read XML throuhg a URL (a FileInputStream can be used to pick up an
            // XML file from the file system)
            InputStream path = new URL(
                    "http://api.openweathermap.org/data/2.5/forecast?q=London,us&mode=xml")
                    .openStream();
    
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document document = builder.parse(path);
    
            // Call to the recursive method with the parent node
            traverse(document.getDocumentElement());
    
            // Print the List values collected within the recursive method
            for (ForeCast forecastObj : forecastList)
                System.out.println(forecastObj);
    
        }
    
        /**
         * 
         * @param node
         */
        public static void traverse(Node node) {
            // Get the list of Child Nodes immediate to the current node
            NodeList list = node.getChildNodes();
    
            // Declare our local instance of forecast object
            ForeCast forecastObj = null;
    
            /**
             * Logical block
             */
            // As per the XML syntax our 2 fields temperature and clouds come
            // directly under the Node/Element time
            if (node.getNodeName().equals("time")
                    && node.getNodeType() == Node.ELEMENT_NODE) {
    
                // Instantiate our custom forecast object
                forecastObj = new ForeCast();
                Element timeElement = (Element) node;
    
                // Get the temperature element by its tag name within the XML (0th
                // index known)
                Element tempElement = (Element) timeElement.getElementsByTagName(
                        "temperature").item(0);
                // Minimum temperature value is selectively picked (for proof of
                // concept)
                forecastObj.setTemperature(tempElement.getAttribute("min"));
    
                // Similarly get the clouds element
                Element cloudElement = (Element) timeElement.getElementsByTagName(
                        "clouds").item(0);
                forecastObj.setClouds(cloudElement.getAttribute("value"));
            }
    
            // Add our foreCastObj if initialized within this recursion, that is if
            // it traverses the time node within the XML, and not in any other case
            if (forecastObj != null)
                forecastList.add(forecastObj);
    
            /**
             * Recursion block
             */
            // Iterate over the next child nodes
            for (int i = 0; i < list.getLength(); i++) {
                Node currentNode = list.item(i);
                // Recursively invoke the method for the current node
                traverse(currentNode);
    
            }
    
        }
    }
    

    The Output

    As you can figure out from the screenshot below, we were able to group together the 2 specific elements and assign their values effectively to a Java Collection instance. We delegated the complex parsing of the xml to the generic recursive solution and customized mainly the logical block part. As mentioned, it is a genetic solution with a minimal customization which can work through all valid xmls.

    enter image description here

    Alternatives

    Many other alternatives are available, here is a list of open source XML parsers for Java.

    However, your approach with PHP and your initial work with Java based parser aligns to the DOM based XML parser solution, simplified by the use of recursion.

    0 讨论(0)
  • 2020-12-21 07:43

    use Java startElement and endElement for DOM Parsers

    0 讨论(0)
  • 2020-12-21 07:44

    The easiest way, if performance is not a main concern, is probably XPath. With XPath, you can find nodes and attributes simply by specifying a path.

    XPathFactory xPathfactory = XPathFactory.newInstance();
    XPath xpath = xPathfactory.newXPath();
    XPathExpression expr = xpath.compile(<xpath_expression>);
    NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
    

    The xpath_expression could be as simple as

    "string(//member/observedProperty/@href)"
    

    For more information about XPath, XPath Tutorial from W3Schools is pretty good.

    0 讨论(0)
  • 2020-12-21 07:52

    The Java APIs, while giving you everything you need, are pretty ridiculous to use as you can see. You might check out Xsylum for something more straightforward:

    (Guessing how your XML is structured):

    List<XmlElement> elements = Xsylum.elementFor(xmlFile).getAll("wfs:member");
    for (XmlElement e : elements)
      String dataType = e.get("omso").get("om").attribute("xlink");
    

    As suggested elsewhere, you also might want to just use XPath to extract what you're after, which is also straightforward with Xsylum:

    List<String> values = Xsylum.documentFor(xmlFile).values("//omso/om/@href");
    
    0 讨论(0)
  • 2020-12-21 07:52

    I agree what has been already posted about not implementing parse functions yourself.

    Instead of DOM/SAX/STAX parsers though, I would suggest using JDOM or XOM, which are external libraries.

    Related discussions:

    • What Java XML library do you recommend (to replace dom4j)?
    • Should I still be using JDOM with Java 5 or 6?

    My gut feeling is that jdom is the one most java developers use. Some use dom4j, some xom, some others, but hardly anybody implements these parsing functions themselves.

    0 讨论(0)
  • 2020-12-21 07:57

    I wouldn't suggest you to implement your own parse function for XML parsing since there are already many options out there. My suggestion is DOM parser. You can find few examples in the following link. (You can also choose from other available options)

    http://www.javacodegeeks.com/2013/05/parsing-xml-using-dom-sax-and-stax-parser-in-java.html

    You can use commands such as

    eElement.getAttribute("id");
    

    Source: http://www.mkyong.com/java/how-to-read-xml-file-in-java-dom-parser/

    0 讨论(0)
提交回复
热议问题