You should refer to the excellent OWASP website for a summary of attacks (including XSS) and defenses against them. Here's the simplest explanation I could come up with, which might actually be more readable than their web page (but probably nowhere nearly as complete).
Specifying a charset. First of all, ensure that your web page specifies the UTF-8 charset in the headers or in the very beginning of the head
element HTML encode all inputs to prevent a UTF-7 attack in Internet Explorer (and older versions of Firefox) despite other efforts to prevent XSS.
HTML escaping. Keep in mind that you need to HTML-escape all user input. This includes replacing <
with <
, >
with >
, &
with &
and "
with "
. If you will ever use single-quoted HTML attributes, you need to replace '
with '
as well. Typical server-side scripting languages such as PHP provide functions to do this, and I encourage you to expand on these by creating standard functions to insert HTML elements rather than inserting them in an ad-hoc manner.
Other types of escaping. You still, however, need to be careful to never insert user input as an unquoted attribute or an attribute interpreted as JavaScript (e.g. onload
or onmouseover
). Obviously, this also applies to script
elements unless the input is properly JavaScript-escaped, which is different from HTML escaping. Another special type of escaping is URL escaping for URL parameters (do it before the HTML escaping to properly include a parameter in a link).
Validating URLs and CSS values. The same goes for URLs of links and images (without validating based on approved prefixes) because of the javascript:
URL scheme, and also CSS stylesheet URLs and data within style
attributes. (Internet Explorer allows inserting JavaScript expressions as CSS values, and Firefox is similarly problematic with its XBL support.) If you must include a CSS value from an untrusted source, you should safely and strictly validate or CSS escape it.
Not allowing user-provided HTML. Do not allow user-provided HTML if you have the option. That is an easy way to end up with an XSS problem, and so is writing a "parser" for your own markup language based on simple regex substitutions. I would only allow formatted text if the HTML output were generated in an obviously safe manner by a real parser that escapes any text from the input using the standard escaping functions and individually builds the HTML elements. If you have no choice over the matter, use a validator/sanitizer such as AntiSamy.
Preventing DOM-based XSS. Do not include user input in JavaScript-generated HTML code and insert it into the document. Instead, use the proper DOM methods to ensure that it is processed as text, not HTML.
Obviously, I cannot cover every single case in which an attacker can insert JavaScript code. In general, HTTP-only cookies can be used to possibly make an XSS attack a bit harder (but by no means prevent one), and giving programmers security training is essential.