Extract a content of a html page in php

守給你的承諾、 提交于 2019-12-23 07:05:31

问题


There is any way to extract the content of a HTML page that starts from <body> and ends with </body> in php. If there can anyone post some sample code.


回答1:


You should have a look at the DOMDocument reference.

This example reads a html document, creates a DOMDocument and gets the body tag:

libxml_use_internal_errors(true);
$dom = new DOMDocument;
$dom->loadHTMLFile('http://example.com');
libxml_use_internal_errors(false);

$body = $dom->getElementsByTagName('body')->item(0);

echo $body->textContent; // print all the text content in the body

You should also check out the following resources:

DOM API Documentation
XPATH language specification




回答2:


Try PHP Simple HTML DOM Parser

$html = file_get_html('http://www.example.com/');
$body = $html->find('body');



回答3:


You can also try to use non-DOM solution based on strpos function:

$html = file_get_contents($url);
$html = substr($html,stripos($html,'<body>')+6);
$html = substr($html,0,strripos($html,'</body>'));

stripos is case insensitive version of strpos, strripos is case insensitive 'rightmost position' version of strpos.

Hope that it will help you!



来源:https://stackoverflow.com/questions/8878381/extract-a-content-of-a-html-page-in-php

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!