Does HTML encoding prevent XSS security exploits?

问题

By simply converting the following ("the big 5"):

& -> &amp;
< -> &lt;
> -> &gt;
" -> &#034;
' -> &#039;

Will you prevent XSS attacks?

I think you need to white list at a character level too, to prevent certain attacks, but the following answer states it overcomplicates matters.

EDIT This page details it does not prevent more elaborate injections, does not help with "out of range characters = question marks" when outputting Strings to Writers with single byte encodings, nor prevents character reinterpretation when user switches browser encoding over displayed page. In essence just escaping these characters seems to be quite a naive approach.

回答1:

Will you prevent XSS attacks?

If you do this escaping at the right time(*) then yes, you will prevent HTML-injection. This is the most common form of XSS attack. It is not just a matter of security, you need to do the escapes anyway so that strings with those characters in will display correctly anyway. The issue of security is a subset of the issue of correctness.

I think you need to white list at a character level too, to prevent certain attacks

No. HTML-escaping will render every one of those attacks as inactive plain text on the page, which is what you want. The range of attacks on that page is demonstrating different ways to do HTML-injection, which can get around the stupider “XSS filters” that some servers deploy to try to prevent common HTML-injection attacks. This demonstrates that “XSS filters” are inherently leaky and ineffective.

There are other forms of XSS attack that might or might not affect you, for example bad schemes on user-submitted URIs (javascript: et al), injection of code into data echoed into a JavaScript block (where you need JSON-style escaping) or into stylesheets or HTTP response headers (again, you always need the appropriate form of encoding when you drop text into another context; you should always be suspicious if you see anything with unescaped interpolation like PHP's "string $var string").

Then there's file upload handling, Flash origin policy, UTF-8 overlong sequences in legacy browsers, and application-level content generation issues; all of these can potentially lead to cross-site scripting. But HTML injection is the main one that every web application will face, and most PHP applications get wrong today.

(*: which is when inserting text content into HTML, and at no other time. Do not HTML-escape form submission data in $_POST/$_GET at the start of your script; this is a common wrong-headed mistake.)

回答2:

OWASP has a great cheat sheet.

Golden Rules
Strategies
Etc.

https://github.com/OWASP/CheatSheetSeries/blob/master/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.md

回答3:

Counter measures depend on the context where the data is inserted in. If you insert the data into HTML, replacing the HTML meta character with escape sequences (i.e. character references) prevents inserting HTML code.

But if your in another context (e.g. HTML attribute value that is interpreted as URL) you have additional meta characters with different escape sequences you have to deal with.

来源：https://stackoverflow.com/questions/2334863/does-html-encoding-prevent-xss-security-exploits

标签

xss

escaping