问题
So I am just trying to scrape an HTML page with PHP. I looked on Google for how to do it, and I uuse the file_get_contents()
method. I wrote a little bit of code, but I am already getting an error that I cannot figure out:
$page = file_get_contents( 'http://php.net/supported-versions.php' );
$doc = new DOMDocument( $page );
//print_r( $page );
foreach ( $doc->getElementsByTagName( 'table' ) as $node ) {
print_r( $node );
}
The first, commented out print_r statement DOES print the page, but the foreach loop should be getting every table in $node but it is printing nothing. What am I doing wrong?
回答1:
You load your DOMDocument
wrong, you need to either ->loadHTMLFile()
or something a like. See the documentation here.
Here is what you need to do instead.
<?php
libxml_use_internal_errors(true);
$doc = new DOMDocument();
$doc->loadHTMLFile("http://php.net/supported-versions.php");
foreach($doc->getElementsByTagName('table') as $table){
var_dump($table);
}
?>
The line libxml_use_internal_errors(true);
makes sure there are no errors thrown when the html is loaded. As nav
and section
tags are not supported as "correct" html for instance.
来源:https://stackoverflow.com/questions/41814491/php-scrape-an-html-page