问题
Need a regex for preg_replace.
This question wasn't answered in "another question" because not all tags I want to remove aren't empty.
I have not only to remove empty tags from an HTML structure, but also tags containing line breaks as well as white spaces and/or their html code.
Possible Codes are:
<br />            
BEFORE removing matching tags:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
<p></p>
<p><br /></p>
<p> <br /> &;thinsp;</p>
<p> </p>
<p> </p>
</div>
AFTER removing matching tags:
<div>
<h1>This is a html structure.</h1>
<p>This is not empty.</p>
</div>
回答1:
You can use the following:
<([^>\s]+)[^>]*>(?:\s*(?:<br \/>| | | | | | | )\s*)*<\/\1>
And replace with ''
(empty string)
See DEMO
Note: This will also work for empty html tags with attributes.
回答2:
Use tidy It uses the following function:
function cleaning($string, $tidyConfig = null) {
$out = array ();
$config = array (
'indent' => true,
'show-body-only' => false,
'clean' => true,
'output-xhtml' => true,
'preserve-entities' => true
);
if ($tidyConfig == null) {
$tidyConfig = &$config;
}
$tidy = new tidy ();
$out ['full'] = $tidy->repairString ( $string, $tidyConfig, 'UTF8' );
unset ( $tidy );
unset ( $tidyConfig );
$out ['body'] = preg_replace ( "/.*<body[^>]*>|<\/body>.*/si", "", $out ['full'] );
$out ['style'] = '<style type="text/css">' . preg_replace ( "/.*<style[^>]*>|<\/style>.*/si", "", $out ['full'] ) . '</style>';
return ($out);
}
回答3:
I'm not so good with regex but, try this
\<.*\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\<\s*br\s*\/\>\s*\&.*sp;\s*\<\/.*\>|\<.*\>\s*\&.*sp;\s*\<\s*br\s*\/\>\<\/.*\>
Basically matches
- Tags with HTML space elements in them OR
- Tags with breaks occurring before HTML space elements in them
- Tags with breaks occurring after HTML space elements in them
来源:https://stackoverflow.com/questions/30865464/how-to-remove-empty-html-tags-wich-containing-whitespaces-and-or-their-html-cod