问题
I am trying to scrape web data using php and dom xpath. When I store the $node->nodeValue into my database or even if i try to echo it, all the tags like <p>
and <br>
are missing. So I am getting all the paras concatenated. How to solve this problem
回答1:
If you have a node, and you need all its contents as they are, you can use this function:
function innerHTML(DOMNode $node)
{
$doc = new DOMDocument();
foreach ($node->childNodes as $child) {
$doc->appendChild($doc->importNode($child, true));
}
return $doc->saveHTML();
}
回答2:
If you're browsing the DOM, most likely there are no longer tags to see. The tags are now nodes within the DOM -- the raw content contained in tags is all you have access to in "string form". You can, of course, use node information to reconstruct the tags, but they won't be the original tags (e.g., you will have to choose <BR>
or <br>
- you won't know which the site originally had). If you want the original tags from the get go, get the original stream of bytes returned by the GET/POST you did; don't parse it into a DOM tree.
来源:https://stackoverflow.com/questions/5349310/how-to-scrape-web-page-data-without-losing-tags