问题
Hi i'm a beginner in using simple_html_dom
. i'm trying to fetch list of href's from list of posts from this sample website having pagination using below code.
<?php
include('simple_html_dom.php');
$html = file_get_html('http://www.themelock.com/wordpress/elegantthemes/');
function getArticles($page) {
global $articles;
$html = new simple_html_dom();
$html->load_file($page);
$items = $html->find('h2[class=post-title]');
foreach($items as $post) {
$articles[] = array($post->children(0)->href);
}
foreach($articles as $item) {
echo "<div class='item'>";
echo $item[0];
echo "</div>";
}
}
if($next = $html->find('div[class=navigation]', 0)->last_child() ) {
$URL = $next->href;
$html->clear();
unset($html);
getArticles($URL);
}
?>
As a result i'm getting
http://www.themelock.com/wordpress/908-minimal-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/892-event-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/882-askit-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/853-lightbright-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/850-inreview-elegantthemes-review-wordpress-theme.html
http://www.themelock.com/wordpress/807-boutique-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/804-elist-elegantthemes-directory-wordpress-theme.html
http://www.themelock.com/wordpress/798-webly-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/795-elegantestate-real-estate-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/786-notebook-elegantthemes-wordpress-theme.html
Above code fetching only Next page (Second page) contents. I'm wondering how to get first page post url's followed by next pages.
Did anyone know how to do this ?
回答1:
Thanks for your support guys, I made this to work using below code,
<?php
include('simple_html_dom.php');
$url = "http://www.themelock.com/wordpress/yootheme-wordpress/";
// Start from the main page
$nextLink = $url;
// Loop on each next Link as long as it exsists
while ($nextLink) {
echo "<hr>nextLink: $nextLink<br>";
//Create a DOM object
$html = new simple_html_dom();
// Load HTML from a url
$html->load_file($nextLink);
$posts = $html->find('h2[class=post-title]');
foreach($posts as $post) {
// Get the link
$articles = $post->children(0)->href;
echo $articles.'</br>';
}
// Extract the next link, if not found return NULL
$nextLink = ( ($temp = $html->find('div[class=navigation]', 0)->last_child()) ? $temp->href : NULL );
// Clear DOM object
$html->clear();
unset($html);
}
?>
来源:https://stackoverflow.com/questions/22669373/php-simple-html-dom-parser-cant-get-content-on-pagination