问题
Here is the line of code I have which works great:
$content = htmlspecialchars($_POST['content'], ENT_QUOTES);
But what I would like to do is allow only certain types of HTML code to pass through without getting converted. Here is the list of HTML code that I would like to have pass:
<pre> </pre>
<b> </b>
<em> </em>
<u> </u>
<ul> </ul>
<li> </li>
<ol> </ol>
And as I go, I would like to also be able to add in more HTML later as I think of it. Could someone help me modify the code above so that the specified list of HTML codes above can pass through without getting converted?
回答1:
I suppose you could do it after the fact:
// $str is the result of htmlspecialchars()
preg_replace('#<(/?(?:pre|b|em|u|ul|li|ol))>#', '<\1>', $str);
It allows the encoded version of <xx>
and </xx>
where xx
is in a controlled set of allowed tags.
回答2:
Or you can go with old style:
$content = htmlspecialchars($_POST['content'], ENT_QUOTES);
$turned = array( '<pre>', '</pre>', '<b>', '</b>', '<em>', '</em>', '<u>', '</u>', '<ul>', '</ul>', '<li>', '</li>', '<ol>', '</ol>' );
$turn_back = array( '<pre>', '</pre>', '<b>', '</b>', '<em>', '</em>', '<u>', '</u>', '<ul>', '</ul>', '<li>', '</li>', '<ol>', '</ol>' );
$content = str_replace( $turned, $turn_back, $content );
回答3:
I improved the way Jack attacks this issue. I added support for <br>, <br/>
and anchor tags. The code will replace fist href="..."
to allow only this attribute to be used.
$str = preg_replace(
array('#href="(.*)"#', '#<(/?(?:pre|a|b|br|em|u|ul|li|ol)(\shref=".*")?/?)>#' ),
array( 'href="\1"', '<\1>' ),
$str
);
回答4:
I liked Elwin's solution, but you probably want to:
- Prevent Javascript: URL's in the
href
- or more likely: allow onlyhttp(s)
. - Make the regex globs non-greedy in case there are multiple
<a href>
's in the content.
Here is the updated version:
$str = preg_replace(
array('#href="(https?://.*?)"#', '#<(/?(?:pre|a|b|br|em|u|ul|li|ol)(\shref=".*?")?/?)>#' ),
array( 'href="\1"', '<\1>' ),
$str
);
回答5:
I made this function to sanitize all HTML special characters except for the HTML tags specified.
It first uses htmlspecialchars() to make the string safe, then it reverts the tags I want to be untouched.
The function supports attribute filtering as an option, however be careful to disable it if you care about possible XSS attacks.
I know regex is not efficient but for moderate string lengths it should be fine. You can check the regex I used here https://regex101.com/r/U6GQse/8
public function sanitizeHtml($string, $safeHtmlTags = array('b','i','u','br'), $filterAttributes = true)
{
$string = htmlspecialchars($string);
if ($filterAttributes) {
$replace = "<$1$2$4>";
} else {
$replace = "<$1$2$3$4>";
}
$string = preg_replace("/<\s*(\/?\s*)(".implode("|", $safeHtmlTags).")(\s?|\s+[\s\S]*?)(\/)?\s*>/", $replace, $string);
return $string;
}
// Example usage to answer the OP question
$str = "MY HTML CONTENT"
echo sanitizeHtml($str, array('pre','b','em','u','ul','li','ol'));
回答6:
You could use strip_tags
$exceptionString = '<pre>,</pre>,<b>,</b>,<em>,</em>,<u>,</u>,<ul>,</ul>,<li>,</li>,<ol>,</ol>';
$content = strip_tags($_POST['content'],$exceptionString );
来源:https://stackoverflow.com/questions/12819804/how-do-i-use-htmlspecialchars-but-allow-only-specific-html-code-to-pass-through