问题
I have a html document and somewhere inside the doc is below a table, I can get the table rows and java DOM objects. What is not clear to me is how to extract the value of the table cell when the value is a string and also when it is a binary resource?
I am using code like:
XPath xpath;
XPathExpression expr;
NodeList nodes=null;
// Use XPath to obtain whatever you want from the (X)HTML
try{
xpath = XPathFactory.newInstance().newXPath();
//<table class="data">
NodeList list = doc.getElementsByTagName("table");
// Node node = list.item(0);
//System.out.println(node.getTextContent());
//String textContent=node.getTextContent();
expr = xpath.compile("//table/tr/td");
nodes = (NodeList)expr.evaluate(doc, XPathConstants.NODESET);
and loopiong like:
for (int i = 0; i < nodes.getLength(); i++) {
Node ln = list.item(i);
String lnText=ln.toString();
NodeList rowElements=ln.getChildNodes();
Node one=rowElements.item(0);
String oneText=one.toString();
String nodeName=one.getNodeName();
String valOne = one.getNodeValue();
But I am not seeing the values in the table.
<table class="data">
<tr><td>ImageName1</td><td width="50"></td><td><img src="/images/036000291452" alt="036000291452" /></td></tr>
<tr><td>ImageName2</td><td width="50"></td><td><img src="/images/36000291452" alt="36000291452" /></td></tr>
<tr><td>Description</td><td></td><td>Time Magazine</td></tr>
<tr><td>Size/Weight</td><td></td><td>14 Issues</td></tr>
<tr><td>Issuing Country</td><td></td><td>United States</td></tr>
</table>
回答1:
This XPath expression:
/*/tr[1]/td[1]
selects the td
element (in no namespace) that is the first child of the first tr
child of the top element (table
) of the provided XML document.
The XPath expression:
/*/tr[1]/td[2]
selects the td
element (in no namespace) that is the second child of the first tr
child of the top element (table
) of the provided XML document.
In general:
/*/tr[$m]/td[$n]
selects the td
element (in no namespace) that is the $n
-th child of the $m
-th tr
child of the top element (table
) of the provided XML document. Just replace $m
and $n
with the desired integer values.
You can use the standard XPath function string() to obtain their string value:
string(/*/tr[$m]/td[$n])
evaluates to the string value of the td
element (in no namespace) that is the $n
-th child of the $m
-th tr
child of the top element (table
) of the provided XML document.
回答2:
Use a path like "string(//td)" to get the string contents of each cell. For linked resources, you will need to use something like "//td/img/@src" to get the URLs, then canonicalize them relative to the source url, and fetch te resulting URL from the network.
来源:https://stackoverflow.com/questions/5931352/xpath-how-to-retrieve-the-value-of-a-table-cell-from-html-document