问题
How to extract data from HTML table in PHP. The data is in this format
Table 1
<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
Table 2
<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
Table 3
<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
I want to get the Data & Data_Text or (Data_Text_1 & Data_Text_2) from the 3 tables.
I've used
$html = file_get_contents($link);
$doc = new DOMDocument();
@$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$nodes = $xpath->query('//td[]');
$nodes2 = $xpath->query('//td[]');
But it cant show any data !
I'll offer bounty for this question on day after tomorrow
回答1:
Using simplehtmldom.php...
<?php
include 'simple_html_dom.php';
$html = file_get_html('thetable.html');
$rows = $html->find('tr');
foreach($rows as $row) {
echo $row->plaintext;
}
?>
or use 'td'...
<?php
include 'simple_html_dom.php';
$html = file_get_html('thetable.html');
$cells = $html->find('td');
foreach($cells as $cell) {
echo $cell->plaintext;
}
?>
回答2:
Given an HTML document called xpathTables.html
like this:
<html>
<body>
<table>
<tbody>
<tr><td class="body" valign="top"><a href="example"><b>DATA</b></a></td><td class="body" valign="top">Data_Text</td></tr>
</tbody>
</table>
<table>
<tbody>
<tr><th><div id="Data">Data</div></th><td>Data_Text_1</td><td>Data_Text_2</td></tr>
</tbody>
</table>
<table>
<tbody>
<tr><td width="120"><a href="example" target="_blank">DATA</a></td><td>Data_Text</td></tr>
</tbody>
</table>
</body>
</html>
And this PHP script:
<?php
$link = "xpathTables.html";
$html = file_get_contents($link);
$doc = new DOMDocument();
$doc->loadHTML($html);
$xpath = new DOMXPath($doc);
$tables = $doc->getElementsByTagName('table');
$nodes = $xpath->query('.//tbody/tr/td/a/b', $tables->item(0));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td[@class="body"]', $tables->item(0));
var_dump($nodes->item(1)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/th/div[@id="Data"]', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(1));
var_dump($nodes->item(1)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td/a', $tables->item(2));
var_dump($nodes->item(0)->nodeValue);
$nodes = $xpath->query('.//tbody/tr/td', $tables->item(2));
var_dump($nodes->item(1)->nodeValue);
You get this output:
string(4) "DATA"
string(9) "Data_Text"
string(4) "Data"
string(11) "Data_Text_1"
string(11) "Data_Text_2"
string(4) "DATA"
string(9) "Data_Text"
I didn't understood well your question, so I made this example in order to show all the text nodes your tables had. If you are only interested in some of those nodes, you should pick the XPath queries that do the job.
I included the tags table
and tbody
, just to make the example more HTML like.
回答3:
Use this single XPath expression:
/*/table/tr//text()[normalize-space()]
This selects any text-node that consists not only odf white-space characters and that is a descendant of any tr
element that is a child of a table
element that is a child of the top element of the document.
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/table/tr//text()[normalize-space()]"/>
. . . . . . .
<xsl:for-each select=
"/*/table/tr//text()[normalize-space()]">
"<xsl:copy-of select="."/>"
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied against the following XML document:
<html>
<table>
<tr>
<td class="body" valign="top">
<a href="example">
<b>DATA</b>
</a>
</td>
<td class="body" valign="top">Data_Text</td>
</tr>
</table>
<table>
<tr>
<th>
<div id="Data">Data</div>
</th>
<td>Data_Text_1</td>
<td>Data_Text_2</td>
</tr>
</table>
<table>
<tr>
<td width="120">
<a href="example" target="_blank">DATA</a>
</td>
<td>Data_Text</td>
</tr>
</table>
</html>
the XPath expression is evaluated and the selected text nodes are output (twice -- once as the result of the evaluation and they appear concatenated, the second time each selected node is output on a separate line and surrounded by quotes):
DATAData_TextDataData_Text_1Data_Text_2DATAData_Text
. . . . . . .
"DATA"
"Data_Text"
"Data"
"Data_Text_1"
"Data_Text_2"
"DATA"
"Data_Text"
来源:https://stackoverflow.com/questions/10369350/extract-data-from-html-table-row-column