Php Simple Html Dom Parser can't get content on pagination

故事扮演 提交于 2020-01-03 05:05:08

问题


Hi i'm a beginner in using simple_html_dom. i'm trying to fetch list of href's from list of posts from this sample website having pagination using below code.

<?php
include('simple_html_dom.php');

$html = file_get_html('http://www.themelock.com/wordpress/elegantthemes/');

function getArticles($page) {

    global $articles;

    $html = new simple_html_dom();
    $html->load_file($page);

    $items = $html->find('h2[class=post-title]');  

    foreach($items as $post) {
        $articles[] = array($post->children(0)->href);
    }

    foreach($articles as $item) {
            echo "<div class='item'>";
            echo $item[0];
            echo "</div>";
        }
}

if($next = $html->find('div[class=navigation]', 0)->last_child() ) {
    $URL = $next->href;

    $html->clear();
    unset($html);

    getArticles($URL);
}

?>

As a result i'm getting

http://www.themelock.com/wordpress/908-minimal-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/892-event-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/882-askit-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/853-lightbright-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/850-inreview-elegantthemes-review-wordpress-theme.html
http://www.themelock.com/wordpress/807-boutique-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/804-elist-elegantthemes-directory-wordpress-theme.html
http://www.themelock.com/wordpress/798-webly-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/795-elegantestate-real-estate-elegantthemes-wordpress-theme.html
http://www.themelock.com/wordpress/786-notebook-elegantthemes-wordpress-theme.html

Above code fetching only Next page (Second page) contents. I'm wondering how to get first page post url's followed by next pages.

Did anyone know how to do this ?


回答1:


Thanks for your support guys, I made this to work using below code,

<?php
include('simple_html_dom.php');

$url = "http://www.themelock.com/wordpress/yootheme-wordpress/";

// Start from the main page
$nextLink = $url;

// Loop on each next Link as long as it exsists
while ($nextLink) {
    echo "<hr>nextLink: $nextLink<br>";
    //Create a DOM object
    $html = new simple_html_dom();
    // Load HTML from a url
    $html->load_file($nextLink);

    $posts = $html->find('h2[class=post-title]');

    foreach($posts as $post) {
        // Get the link
        $articles = $post->children(0)->href;        
        echo $articles.'</br>';
    }

    // Extract the next link, if not found return NULL
    $nextLink = ( ($temp = $html->find('div[class=navigation]', 0)->last_child()) ? $temp->href : NULL );

    // Clear DOM object
    $html->clear();
    unset($html);
}

?>


来源:https://stackoverflow.com/questions/22669373/php-simple-html-dom-parser-cant-get-content-on-pagination

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!