问题
There is any way to extract the content of a HTML
page that starts from <body>
and ends with </body>
in php. If there can anyone post some sample code.
回答1:
You should have a look at the DOMDocument reference.
This example reads a html document, creates a DOMDocument
and gets the body tag:
libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);
$body = $dom->getElementsByTagName('body')->item(0);
echo $body->textContent; // print all the text content in the body
You should also check out the following resources:
DOM API Documentation
XPATH language specification
回答2:
Try PHP Simple HTML DOM Parser
$html = file_get_html('http://www.example.com/');
$body = $html->find('body');
回答3:
You can also try to use non-DOM solution based on strpos
function:
$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));
stripos
is case insensitive version of strpos
, strripos
is case insensitive 'rightmost position' version of strpos
.
Hope that it will help you!
来源:https://stackoverflow.com/questions/8878381/extract-a-content-of-a-html-page-in-php