问题
Employing DOMDocument, I'm trying to read a portion of an HTML file and displaying it on a different HTML page using the code below. The DIV portion that I'm trying to access has several <p>
tags. The problem is when DOM parses the file, it only fetches the text content between the <p>
tags - strips tags - and the paragraph formatting is lost. It merges the texts and displays them all as one paragraph. How can I keep the HTML formatting so that the paragraphs are displayed as they were in the source file?
HTML Code
<div class="text_container">
<h3>Title</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli.
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli.
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli.
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>
DOMDocumnet Code
<?php
$page = file_get_contents('word.php');
$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
if ($div->getAttribute('class') === 'text_container') {
echo '<p>',$div->nodeValue,'</p>';
}
?>
回答1:
You can define a custom function DOMinnerHTML()
(described here) to retrieve an element's inner HTML, rather than its text content. It works by temorarlily creating a new document:
<?php
function DOMinnerHTML($element)
{
$innerHTML = "";
$children = $element->childNodes;
foreach ($children as $child)
{
$tmp_dom = new DOMDocument();
$tmp_dom->appendChild($tmp_dom->importNode($child, true));
$innerHTML.=trim($tmp_dom->saveHTML());
}
return $innerHTML;
}
?>
Example usage:
$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
if ($div->getAttribute('class') === 'text_container') {
$innerHtml = DOMinnerHTML($div);
echo '<div>' . $innerHtml . '</div>';
}
}
来源:https://stackoverflow.com/questions/17065063/how-to-keep-html-formatting-intact-when-parsing-with-dom-no-tag-stripping