问题
I am using HTML Simple Dom Parser with PHP to get title, description and images from a website. The issue I am facing is I am getting the html which I dont want and how to exclude those html tags. Below is the explanation.
Here is a sample html structure which is being parsed.
<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>
// the div I dont want
<div id="comments">
<h1> Some Text </h1>
</div>
</div>
I am using below php script to parse,
foreach($html->find('div#product_description') as $description)
{
echo $description->outertext ;
echo "<br>";
}
The above code parses everything inside the div with id "product_description". What I want to exclude the div with Id "comments". I tried to convert this into string and then used substr to exclude the last character but thats not working. Dont know why. Any idea about how can I do this? Any approach that will allow me to exclude the div from parsed html will work. Thanks
回答1:
You can remove the elements you don't want by setting their outertext = ''
:
$src =<<<src
<div id="product_description">
<p> Some text</p>
<ul>
<li>value 1</li>
<li>value 2</li>
<li>value 3</li>
</ul>
<!-- the div I don't want -->
<div id="comments">
<h1> Some Text </h1>
</div>
</div>
src;
$html = str_get_html($src);
foreach($html->find('#product_description') as $description)
{
$comments = $description->find('#comments', 0);
$comments->outertext = '';
print $description->outertext ;
}
回答2:
Ok So i figured out myself just use Advanced Html Dom library its totally compatible with simple html dom & by using it you will get much more control. Its very simple to remove what you want from parsed html. For Ex.
//to remove script tag
$scripts = $description->find('script')->remove;
//to remove css style tag
$style = $description->find('style')->remove;
// to remove a div with class name findify-element
$findify = $description->find('div.findify-element')->remove;
enter link description here
来源:https://stackoverflow.com/questions/61014488/exclude-non-wanted-html-from-simple-html-dom-php