I am using HTML Purifier to protect my application from XSS attacks. Currently I am purifying content from WYSIWYG editors because that is the only place where users are all
HTML Purifier takes HTML as input, and produces HTML as output. Its purpose is to allow the user to enter html with some tags, attributes, and values, while filtering out others. This uses a whitelist to prevent any data that can contain scripts. So this is useful for something like a WYSIWYG editor.
Usernames and passwords on the other hand are not HTML. They're plain text, so HTML purifier is not an option. Trying to use HTML Purifier here would either corrupt the data, or allow XSS attacks.
For example, it lets the following through unchanged, which can cause XSS issues when inserted as an attribute value in some elements:
" onclick="javascript:alert()" href="
Or if someone tried to use special symbols in their password, and entered:
<password
then their password would become blank, and make it much easier to guess.
Instead, you should encode the text. The encoding required depends on the context, but you can use htmlentities
when outputting these values if you stick to rule #0 and rule #1, at the OWASP XSS Prevention Cheat Sheet
XSS risks exist where ever data entered by one user may be viewed by other users. Even if this data isn't currently viewable, don't assume that a need to do this won't arise.
As far as the username and password go, you should never display a password, or even store it in a form that can be displayed (i.e. encyrpt it with sha1()
). For usernames, have a restriction on legal characters like [A-Za-z0-9_]
. Finally, as the other answer suggests, use your languages html entity encoding function for any entered data that may contain reserved or special html characters, which prevents this data from causing syntax errors when displayed.
You should Purify anything that will ever possibly be displayed on a page. Because with XSS attacks, hackers put in <script>
tags or other malicious tags that can link to other sites.
Passwords and emails should be fine. Passwords should never be shown and emails should have their own validator to make sure that they are in the proper format.
Finally, always remember to put in htmlentities() on content.
Oh .. and look at filter_var aswell. Very nice way of filtering variables.
No, I wouldn't use HTMLPurifier on username and password during login authentication. In my appllications I use alphanumeric usernames and an input validation filter and display them with htmlspecialchars with ENT_QUOTES. This is very effective and a hell lot faster than HTMLpurifier. I'm yet to see an XSS attack using alphanumeric string. And BTW HTMLPurifier is useless when filtering alphanumeric content anyway so if you force the input string through an alphanumeric filter then there is no point to display it with HTMLpurifier. When it comes to passwords they should never be displayed to anybody in the first place which eliminates the possibility of XSS. And if for some perverse reason you want to display the passwords then you should design your application in such a way that it allows only the owner of the password to be able to see it, otherwise you are screwed big time and XSS is the least of your worry!