PHP: strip_tags - remove only certain tags (and their contents)?

谁说我不能喝 提交于 2019-11-27 06:47:21

问题


I use the strip_tags() function but I need to remove some tags (and all of their contents).

for example :

<div>
  <p class="test">
    Test A
  </p>
  <span>
    Test B
  </span>
  <div>
    Test C
  </div>
</div>

Let's say, I need to get rid of the P and SPAN tags, and only keep :

<div>
  <div>
    Test C
  </div>
</div>

strip_tags expects as a second parameter the tags that you want to KEEP.

In this particular example I could use striptags($html, "<div>"); but the html I'm scraping and the tags that need to be removed are different all the time.

I searched for hours for a function that suits my needs, but couldn't find anything useful.

Any idea's?


回答1:


Use a regular expression. Something like this should work:

$tags = array( 'p', 'span');
$text = preg_replace( '#<(' . implode( '|', $tags) . ')>.*?<\/$1>#s', '', $text);

The demo shows it replacing the desired tags with nothing.

Note that you may need to tweak it more, say, to compensate for whitespace within the tags, or other unknowns that your example does not demonstrate.

Here is the regex to use to capture tags with or without attributes:

'#<(' . implode( '|', $tags) . ')(?:[^>]+)?>.*?<\/$1>#s'



回答2:


You say that you are using Simple HTML DOM (Good! That's the right way to parse HTML). When I need to remove a tag and its contents, I do:

$rows = $html->find("span");

foreach ($rows as $row)
{
  $row->outertext = "";
}

$html->load($html->save());

The last line is required because the DOM gets confused after modifications are made so the entire DOM has to be collapsed and then parsed again so that the changes are made permanent (IMO, a bug in Simple HTML DOM).

The Simple HTML DOM approach is safer and more stable than a regular expression.



来源:https://stackoverflow.com/questions/11165895/php-strip-tags-remove-only-certain-tags-and-their-contents

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!