How Do I use htmlspecialchars but allow only specific HTML code to pass through without getting converted?

雨燕双飞 提交于 2019-12-19 05:12:21

问题


Here is the line of code I have which works great:

$content = htmlspecialchars($_POST['content'], ENT_QUOTES);

But what I would like to do is allow only certain types of HTML code to pass through without getting converted. Here is the list of HTML code that I would like to have pass:

<pre> </pre>
<b> </b>
<em> </em>
<u> </u>
<ul> </ul>
<li> </li>
<ol> </ol>

And as I go, I would like to also be able to add in more HTML later as I think of it. Could someone help me modify the code above so that the specified list of HTML codes above can pass through without getting converted?


回答1:


I suppose you could do it after the fact:

// $str is the result of htmlspecialchars()
preg_replace('#&lt;(/?(?:pre|b|em|u|ul|li|ol))&gt;#', '<\1>', $str);

It allows the encoded version of <xx> and </xx> where xx is in a controlled set of allowed tags.




回答2:


Or you can go with old style:

$content = htmlspecialchars($_POST['content'], ENT_QUOTES);

$turned = array( '&lt;pre&gt;', '&lt;/pre&gt;', '&lt;b&gt;', '&lt;/b&gt;', '&lt;em&gt;', '&lt;/em&gt;', '&lt;u&gt;', '&lt;/u&gt;', '&lt;ul&gt;', '&lt;/ul&gt;', '&lt;li&gt;', '&lt;/li&gt;', '&lt;ol&gt;', '&lt;/ol&gt;' );
$turn_back = array( '<pre>', '</pre>', '<b>', '</b>', '<em>', '</em>', '<u>', '</u>', '<ul>', '</ul>', '<li>', '</li>', '<ol>', '</ol>' );

$content = str_replace( $turned, $turn_back, $content );



回答3:


I improved the way Jack attacks this issue. I added support for <br>, <br/> and anchor tags. The code will replace fist href=&quot;...&quot; to allow only this attribute to be used.

$str = preg_replace(
    array('#href=&quot;(.*)&quot;#', '#&lt;(/?(?:pre|a|b|br|em|u|ul|li|ol)(\shref=".*")?/?)&gt;#' ), 
    array( 'href="\1"', '<\1>' ), 
    $str
);



回答4:


I liked Elwin's solution, but you probably want to:

  1. Prevent Javascript: URL's in the href - or more likely: allow only http(s).
  2. Make the regex globs non-greedy in case there are multiple <a href>'s in the content.

Here is the updated version:

$str = preg_replace(
    array('#href=&quot;(https?://.*?)&quot;#', '#&lt;(/?(?:pre|a|b|br|em|u|ul|li|ol)(\shref=".*?")?/?)&gt;#' ), 
    array( 'href="\1"', '<\1>' ), 
    $str
);



回答5:


I made this function to sanitize all HTML special characters except for the HTML tags specified.

It first uses htmlspecialchars() to make the string safe, then it reverts the tags I want to be untouched.

The function supports attribute filtering as an option, however be careful to disable it if you care about possible XSS attacks.

I know regex is not efficient but for moderate string lengths it should be fine. You can check the regex I used here https://regex101.com/r/U6GQse/8

public function sanitizeHtml($string, $safeHtmlTags = array('b','i','u','br'), $filterAttributes = true)
{
    $string = htmlspecialchars($string);

    if ($filterAttributes) {
        $replace = "<$1$2$4>";
    } else {
        $replace = "<$1$2$3$4>";
    }
    $string = preg_replace("/&lt;\s*(\/?\s*)(".implode("|", $safeHtmlTags).")(\s?|\s+[\s\S]*?)(\/)?\s*&gt;/", $replace, $string);

    return $string;
}

// Example usage to answer the OP question
$str = "MY HTML CONTENT"
echo sanitizeHtml($str, array('pre','b','em','u','ul','li','ol'));



回答6:


You could use strip_tags

$exceptionString = '<pre>,</pre>,<b>,</b>,<em>,</em>,<u>,</u>,<ul>,</ul>,<li>,</li>,<ol>,</ol>';

$content = strip_tags($_POST['content'],$exceptionString );


来源:https://stackoverflow.com/questions/12819804/how-do-i-use-htmlspecialchars-but-allow-only-specific-html-code-to-pass-through

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!