I want to read XML data using XPath in Java, so for the information I have gathered I am not able to parse XML according to my requirement.
here is what I want to do
This shows you how to
DOM
Nodes
with XPath
Nodes
. We will call the code with the following statement
processFilteredXml(xmlIn, xpathExpr,(node) -> {/*Do something...*/;});
In our case we want to print some creatorNames
from a book.xml
using "//book/creators/creator/creatorName"
as xpath to perform a printNode
action on each Node that matches the XPath
.
Full code
@Test
public void printXml() {
try (InputStream in = readFile("book.xml")) {
processFilteredXml(in, "//book/creators/creator/creatorName", (node) -> {
printNode(node, System.out);
});
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private InputStream readFile(String yourSampleFile) {
return Thread.currentThread().getContextClassLoader().getResourceAsStream(yourSampleFile);
}
private void processFilteredXml(InputStream in, String xpath, Consumer<Node> process) {
Document doc = readXml(in);
NodeList list = filterNodesByXPath(doc, xpath);
for (int i = 0; i < list.getLength(); i++) {
Node node = list.item(i);
process.accept(node);
}
}
public Document readXml(InputStream xmlin) {
try {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(xmlin);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private NodeList filterNodesByXPath(Document doc, String xpathExpr) {
try {
XPathFactory xPathFactory = XPathFactory.newInstance();
XPath xpath = xPathFactory.newXPath();
XPathExpression expr = xpath.compile(xpathExpr);
Object eval = expr.evaluate(doc, XPathConstants.NODESET);
return (NodeList) eval;
} catch (Exception e) {
throw new RuntimeException(e);
}
}
private void printNode(Node node, PrintStream out) {
try {
Transformer transformer = TransformerFactory.newInstance().newTransformer();
transformer.setOutputProperty(OutputKeys.INDENT, "yes");
transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2");
StreamResult result = new StreamResult(new StringWriter());
DOMSource source = new DOMSource(node);
transformer.transform(source, result);
String xmlString = result.getWriter().toString();
out.println(xmlString);
} catch (Exception e) {
throw new RuntimeException(e);
}
}
Prints
<creatorName>Fosmire, Michael</creatorName>
<creatorName>Wertz, Ruth</creatorName>
<creatorName>Purzer, Senay</creatorName>
For book.xml
<book>
<creators>
<creator>
<creatorName>Fosmire, Michael</creatorName>
<givenName>Michael</givenName>
<familyName>Fosmire</familyName>
</creator>
<creator>
<creatorName>Wertz, Ruth</creatorName>
<givenName>Ruth</givenName>
<familyName>Wertz</familyName>
</creator>
<creator>
<creatorName>Purzer, Senay</creatorName>
<givenName>Senay</givenName>
<familyName>Purzer</familyName>
</creator>
</creators>
<titles>
<title>Critical Engineering Literacy Test (CELT)</title>
</titles>
</book>
Getting started example:
xml file:
<inventory>
<book year="2000">
<title>Snow Crash</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553380958</isbn>
<price>14.95</price>
</book>
<book year="2005">
<title>Burning Tower</title>
<author>Larry Niven</author>
<author>Jerry Pournelle</author>
<publisher>Pocket</publisher>
<isbn>0743416910</isbn>
<price>5.99</price>
</book>
<book year="1995">
<title>Zodiac</title>
<author>Neal Stephenson</author>
<publisher>Spectra</publisher>
<isbn>0553573862</isbn>
<price>7.50</price>
</book>
<!-- more books... -->
</inventory>
Java code:
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.testng.annotations.DataProvider;
import org.testng.annotations.Test;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
import org.xml.sax.SAXParseException;
try {
DocumentBuilderFactory docBuilderFactory = DocumentBuilderFactory.newInstance();
DocumentBuilder docBuilder = docBuilderFactory.newDocumentBuilder();
Document doc = docBuilder.parse (new File("c:\\tmp\\my.xml"));
// normalize text representation
doc.getDocumentElement().normalize();
System.out.println ("Root element of the doc is " + doc.getDocumentElement().getNodeName());
NodeList listOfBooks = doc.getElementsByTagName("book");
int totalBooks = listOfBooks.getLength();
System.out.println("Total no of books : " + totalBooks);
for(int i=0; i<listOfBooks.getLength() ; i++) {
Node firstBookNode = listOfBooks.item(i);
if(firstBookNode.getNodeType() == Node.ELEMENT_NODE) {
Element firstElement = (Element)firstBookNode;
System.out.println("Year :"+firstElement.getAttribute("year"));
//-------
NodeList firstNameList = firstElement.getElementsByTagName("title");
Element firstNameElement = (Element)firstNameList.item(0);
NodeList textFNList = firstNameElement.getChildNodes();
System.out.println("title : " + ((Node)textFNList.item(0)).getNodeValue().trim());
}
}//end of for loop with s var
} catch (SAXParseException err) {
System.out.println ("** Parsing error" + ", line " + err.getLineNumber () + ", uri " + err.getSystemId ());
System.out.println(" " + err.getMessage ());
} catch (SAXException e) {
Exception x = e.getException ();
((x == null) ? e : x).printStackTrace ();
} catch (Throwable t) {
t.printStackTrace ();
}
You can try this.
Save as employees.xml
.
<?xml version="1.0" encoding="UTF-8"?>
<Employees>
<Employee id="1">
<age>29</age>
<name>Pankaj</name>
<gender>Male</gender>
<role>Java Developer</role>
</Employee>
<Employee id="2">
<age>35</age>
<name>Lisa</name>
<gender>Female</gender>
<role>CEO</role>
</Employee>
<Employee id="3">
<age>40</age>
<name>Tom</name>
<gender>Male</gender>
<role>Manager</role>
</Employee>
<Employee id="4">
<age>25</age>
<name>Meghan</name>
<gender>Female</gender>
<role>Manager</role>
</Employee>
</Employees>
The class have following methods
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.SAXException;
public class Parser {
public static void main(String[] args) {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder;
Document doc = null;
try {
builder = factory.newDocumentBuilder();
doc = builder.parse("employees.xml");
// Create XPathFactory object
XPathFactory xpathFactory = XPathFactory.newInstance();
// Create XPath object
XPath xpath = xpathFactory.newXPath();
String name = getEmployeeNameById(doc, xpath, 4);
System.out.println("Employee Name with ID 4: " + name);
List<String> names = getEmployeeNameWithAge(doc, xpath, 30);
System.out.println("Employees with 'age>30' are:" + Arrays.toString(names.toArray()));
List<String> femaleEmps = getFemaleEmployeesName(doc, xpath);
System.out.println("Female Employees names are:" +
Arrays.toString(femaleEmps.toArray()));
} catch (ParserConfigurationException | SAXException | IOException e) {
e.printStackTrace();
}
}
private static List<String> getFemaleEmployeesName(Document doc, XPath xpath) {
List<String> list = new ArrayList<>();
try {
//create XPathExpression object
XPathExpression expr =
xpath.compile("/Employees/Employee[gender='Female']/name/text()");
//evaluate expression result on XML document
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++)
list.add(nodes.item(i).getNodeValue());
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return list;
}
private static List<String> getEmployeeNameWithAge(Document doc, XPath xpath, int age) {
List<String> list = new ArrayList<>();
try {
XPathExpression expr =
xpath.compile("/Employees/Employee[age>" + age + "]/name/text()");
NodeList nodes = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
for (int i = 0; i < nodes.getLength(); i++)
list.add(nodes.item(i).getNodeValue());
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return list;
}
private static String getEmployeeNameById(Document doc, XPath xpath, int id) {
String name = null;
try {
XPathExpression expr =
xpath.compile("/Employees/Employee[@id='" + id + "']/name/text()");
name = (String) expr.evaluate(doc, XPathConstants.STRING);
} catch (XPathExpressionException e) {
e.printStackTrace();
}
return name;
}
}
You need something along the lines of this:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(<uri_as_string>);
XPathFactory xPathfactory = XPathFactory.newInstance();
XPath xpath = xPathfactory.newXPath();
XPathExpression expr = xpath.compile(<xpath_expression>);
Then you call expr.evaluate()
passing in the document defined in that code and the return type you are expecting, and cast the result to the object type of the result.
If you need help with a specific XPath expressions, you should probably ask it as separate questions (unless that was your question in the first place here - I understood your question to be how to use the API in Java).
Edit: (Response to comment): This XPath expression will get you the text of the first URL element under PowerBuilder:
/howto/topic[@name='PowerBuilder']/url/text()
This will get you the second:
/howto/topic[@name='PowerBuilder']/url[2]/text()
You get that with this code:
expr.evaluate(doc, XPathConstants.STRING);
If you don't know how many URLs are in a given node, then you should rather do something like this:
XPathExpression expr = xpath.compile("/howto/topic[@name='PowerBuilder']/url");
NodeList nl = (NodeList) expr.evaluate(doc, XPathConstants.NODESET);
And then loop over the NodeList.
Expanding on the excellent answer by @bluish and @Yishai, here is how you make the NodeLists and node attributes support iterators, i.e. the for(Node n: nodelist)
interface.
Use it like:
NodeList nl = ...
for(Node n : XmlUtil.asList(nl))
{...}
and
Node n = ...
for(Node attr : XmlUtil.asList(n.getAttributes())
{...}
The code:
/**
* Converts NodeList to an iterable construct.
* From: https://stackoverflow.com/a/19591302/779521
*/
public final class XmlUtil {
private XmlUtil() {}
public static List<Node> asList(NodeList n) {
return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeListWrapper(n);
}
static final class NodeListWrapper extends AbstractList<Node> implements RandomAccess {
private final NodeList list;
NodeListWrapper(NodeList l) {
this.list = l;
}
public Node get(int index) {
return this.list.item(index);
}
public int size() {
return this.list.getLength();
}
}
public static List<Node> asList(NamedNodeMap n) {
return n.getLength() == 0 ? Collections.<Node>emptyList() : new NodeMapWrapper(n);
}
static final class NodeMapWrapper extends AbstractList<Node> implements RandomAccess {
private final NamedNodeMap list;
NodeMapWrapper(NamedNodeMap l) {
this.list = l;
}
public Node get(int index) {
return this.list.item(index);
}
public int size() {
return this.list.getLength();
}
}
}
If you have a xml like below
<e:Envelope
xmlns:d = "http://www.w3.org/2001/XMLSchema"
xmlns:e = "http://schemas.xmlsoap.org/soap/envelope/"
xmlns:wn0 = "http://systinet.com/xsd/SchemaTypes/"
xmlns:i = "http://www.w3.org/2001/XMLSchema-instance">
<e:Header>
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
</e:Header>
<e:Body>
<n0:ForAnsiHeaderOperResponse xmlns:n0 = "http://systinet.com/wsdl/com/magicsoftware/ibolt/localhost/ForAnsiHeader/ForAnsiHeaderImpl#ForAnsiHeaderOper?KExqYXZhL2xhbmcvU3RyaW5nOylMamF2YS9sYW5nL1N0cmluZzs=">
<response i:type = "d:string">12--abc--pqr</response>
</n0:ForAnsiHeaderOperResponse>
</e:Body>
</e:Envelope>
and wanted to extract the below xml
<e:Header>
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
</e:Header>
The below code helps to achieve the same
public static void main(String[] args) {
File fXmlFile = new File("C://Users//abhijitb//Desktop//Test.xml");
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
Document document;
Node result = null;
try {
document = dbf.newDocumentBuilder().parse(fXmlFile);
XPath xPath = XPathFactory.newInstance().newXPath();
String xpathStr = "//Envelope//Header";
result = (Node) xPath.evaluate(xpathStr, document, XPathConstants.NODE);
System.out.println(nodeToString(result));
} catch (SAXException | IOException | ParserConfigurationException | XPathExpressionException
| TransformerException e) {
e.printStackTrace();
}
}
private static String nodeToString(Node node) throws TransformerException {
StringWriter buf = new StringWriter();
Transformer xform = TransformerFactory.newInstance().newTransformer();
xform.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION, "yes");
xform.transform(new DOMSource(node), new StreamResult(buf));
return (buf.toString());
}
Now if you want only the xml like below
<Friends>
<friend>
<Name>Testabc</Name>
<Age>12121</Age>
<Phone>Testpqr</Phone>
</friend>
</Friends>
You need to change the
String xpathStr = "//Envelope//Header";
to String xpathStr = "//Envelope//Header/*";