I have been reading that you HTML encode on the way back from the server to the client (I think?) and this will prevent many types of XSS attacks. However, I don\'t understa
Think about it: What does encoded HTML look like? For example, it could look like this:
<a href="www.stackoverflow.com">
So it will be rendered on the client as the literals (as <a href="www.stackoverflow.com">), not as HTML. Meaning you won't see an actual link, but the code itself.
XSS attacks work on the basis that someone can make a client browser parse HTML that the site provider didn't intend to be on there; if the above weren't encoded, it would mean that the provided link would be embedded in the site, although the site provider didn't want that.
XSS is of course a little more elaborate than that, and usually involves JavaScript as well (hence the Cross Site Scripting), but for demonstration purposes this simple example should suffice; it's the same with JavaScript code as with simple HTML tags, since XSS is a special case of the more general HTML injection.
HTML encoding turns <div>
into <div>
, which means that any HTML markup will display on the page as text, rather than executed as HTML markup.
The basic entities that are converted are:
&
to &
<
to <
>
to >
"
to "
OWASP recommends encoding some additional characters:
'
to '
/
to /
These encodings are how you textually represent characters that would otherwise be consumed as markup. If you wanted to write a<b
you'd have to be careful that <b
isn't treated like an HTML element. If you use a<b
the text that will be displayed to the user will be a<b
.