PHP scrape an html page

问题

So I am just trying to scrape an HTML page with PHP. I looked on Google for how to do it, and I uuse the file_get_contents() method. I wrote a little bit of code, but I am already getting an error that I cannot figure out:

    $page = file_get_contents( 'http://php.net/supported-versions.php' );
    $doc = new DOMDocument( $page );
    //print_r( $page );

foreach ( $doc->getElementsByTagName( 'table' ) as $node ) {
    print_r( $node );
}

The first, commented out print_r statement DOES print the page, but the foreach loop should be getting every table in $node but it is printing nothing. What am I doing wrong?

回答1:

You load your DOMDocument wrong, you need to either ->loadHTMLFile() or something a like. See the documentation here.

Here is what you need to do instead.

<?php
    libxml_use_internal_errors(true);
    $doc = new DOMDocument();
    $doc->loadHTMLFile("http://php.net/supported-versions.php");
    foreach($doc->getElementsByTagName('table') as $table){
        var_dump($table);
    }
?>

The line libxml_use_internal_errors(true); makes sure there are no errors thrown when the html is loaded. As nav and section tags are not supported as "correct" html for instance.

来源：https://stackoverflow.com/questions/41814491/php-scrape-an-html-page

标签

php

html

scrape

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!