问题
I have an HTML table that I would like to parse in PHP to store into a MySQL Database. The HTML looks like this:
<tr><td>DATE</td><td>LOCATION</td><td><a href="URL">NAME</a></td></tr>
I would like to create a PHP function that returns in an array, the fields in capital letters. Does anyone know any php libraries that can do this, or should I be using a different language, as this may be complex. I don't know exactly how to do this with many tables on the page, but I am trying to parse the VEX events on RobotEvents. The table that I want to parse starts at line 465.
回答1:
As you're prepared to look beyond PHP, Nokogiri (Ruby) and Beautiful Soup (Python) are well-established libraries that parse HTML very well.
That doesn't imply that there are no suitable PHP libraries.
回答2:
Take a look at the PHP HTML DOM Parser library.
To use, you can do something similar to this (not my example):
require('simple_html_dom.php');
$table = array();
$html = file_get_html('http://flow935.com/playlist/flowhis.HTM');
foreach($html->find('tr') as $row) {
$time = $row->find('td',0)->plaintext;
$artist = $row->find('td',1)->plaintext;
$title = $row->find('td',2)->plaintext;
$table[$artist][$title] = true;
}
echo '<pre>';
print_r($table);
echo '</pre>';
There's some tutorials, SO questions and interesting reads about the library. It seems to be pretty popular.
- http://davidwalsh.name/php-notifications
- http://net.tutsplus.com/tutorials/php/html-parsing-and-screen-scraping-with-the-simple-html-dom-library/
- Looping through a table with Simple HTML DOM
- how to print cells of a table with simple html dom
UPDATE FOR FINDING SPECIFIC TABLE IN HTML USING ABOVE LIBRARY
To find a particular table amongst many:
1. By class:
On line 465 of your scraped HTML, the table starts with a class catalog-listing
, so:
foreach ($html->find('table[@class="catalog-listing"]')->find('tr') as $row) {
// extract TD data
}
2. By instance (find 2nd table in HTML)
foreach ($html->find('table', 2)->find('tr') as $row) {
// extract TD data
}
来源:https://stackoverflow.com/questions/20724728/parse-html-table-php