How to Keep HTML Formatting Intact When Parsing with DOM - (No Tag Stripping)

六眼飞鱼酱① 提交于 2020-01-05 08:17:48

问题


Employing DOMDocument, I'm trying to read a portion of an HTML file and displaying it on a different HTML page using the code below. The DIV portion that I'm trying to access has several <p> tags. The problem is when DOM parses the file, it only fetches the text content between the <p> tags - strips tags - and the paragraph formatting is lost. It merges the texts and displays them all as one paragraph. How can I keep the HTML formatting so that the paragraphs are displayed as they were in the source file?

HTML Code

<div class="text_container">
<h3>Title</h3>
<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>     

<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>

<p>Lorem ipsum dolor sit amet, consectetur adipiscing eli. 
Lorem ipsum dolor sit amet, consectetur adipiscing eli.</p>

DOMDocumnet Code

<?php

$page = file_get_contents('word.php');
$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    if ($div->getAttribute('class') === 'text_container') {
         echo '<p>',$div->nodeValue,'</p>';

    }

?>

回答1:


You can define a custom function DOMinnerHTML() (described here) to retrieve an element's inner HTML, rather than its text content. It works by temorarlily creating a new document:

<?php 
function DOMinnerHTML($element) 
{ 
    $innerHTML = ""; 
    $children = $element->childNodes; 
    foreach ($children as $child) 
    { 
        $tmp_dom = new DOMDocument(); 
        $tmp_dom->appendChild($tmp_dom->importNode($child, true)); 
        $innerHTML.=trim($tmp_dom->saveHTML()); 
    } 
    return $innerHTML; 
} 
?> 

Example usage:

$doc = new DOMDocument();
$doc -> loadHTML($page);
$divs = $doc->getElementsByTagName('div');
foreach($divs as $div) {
    if ($div->getAttribute('class') === 'text_container') {
        $innerHtml = DOMinnerHTML($div);
        echo '<div>' . $innerHtml . '</div>';
    }
}


来源:https://stackoverflow.com/questions/17065063/how-to-keep-html-formatting-intact-when-parsing-with-dom-no-tag-stripping

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!