How to remove empty html tags (wich containing whitespaces and/or their html codes)

送分小仙女□ 提交于 2020-01-01 12:35:12

问题


Need a regex for preg_replace.

This question wasn't answered in "another question" because not all tags I want to remove aren't empty.

I have not only to remove empty tags from an HTML structure, but also tags containing line breaks as well as white spaces and/or their html code.

Possible Codes are:

<br /> &nbsp; &thinsp; &ensp; &emsp; &#8201; &#8194; &#8195;

BEFORE removing matching tags:

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
  <p></p> 
  <p><br /></p>
  <p> <br /> &;thinsp;</p>
  <p>&nbsp;</p> 
  <p> &nbsp; </p> 
</div>

AFTER removing matching tags:

<div> 
  <h1>This is a html structure.</h1> 
  <p>This is not empty.</p> 
</div>

回答1:


You can use the following:

<([^>\s]+)[^>]*>(?:\s*(?:<br \/>|&nbsp;|&thinsp;|&ensp;|&emsp;|&#8201;|&#8194;|&#8195;)\s*)*<\/\1>

And replace with '' (empty string)

See DEMO

Note: This will also work for empty html tags with attributes.




回答2:


Use tidy It uses the following function:

function cleaning($string, $tidyConfig = null) {
    $out = array ();
    $config = array (
            'indent' => true,
            'show-body-only' => false,
            'clean' => true,
            'output-xhtml' => true,
            'preserve-entities' => true 
    );
    if ($tidyConfig == null) {
        $tidyConfig = &$config;
    }
    $tidy = new tidy ();
    $out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
    unset ( $tidy );
    unset ( $tidyConfig );
    $out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
    $out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
    return ($out);
}



回答3:


I'm not so good with regex but, try this

\<.*\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\<\s*br\s*\/\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\&.*sp;\s*\<\s*br\s*\/\>\<\/.*\>

Basically matches

  • Tags with HTML space elements in them OR
  • Tags with breaks occurring before HTML space elements in them
  • Tags with breaks occurring after HTML space elements in them


来源:https://stackoverflow.com/questions/30865464/how-to-remove-empty-html-tags-wich-containing-whitespaces-and-or-their-html-cod

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!