问题
Say that I want to provide some data to my client (in the first response, with no latency) via a dynamic <script>
element.
<script><%= payload %></script>
Say that payload
is the string var data = '</script><script>alert("Muahahaha!")';</script>
. An end tag (</script>
) will allow users to inject arbitrary scripts into my page. How do I properly sanitize the contents of my script element?
I figure I could change </script>
to <\/script>
and <!--
to <\!--
. Are there any other dangerous strings I need to escape? Is there a better way to provide this "cold start" data?
回答1:
Edited for non-mutation of data.
If I'm interpreting this correctly. You want to prevent the user from ending the script
tag prematurely within the user submitted string. That can be done for html just as you stated with adding the backslash in with the ending tag <\/script>
. That is the only escaping you should have to worry about in that case. You shouldn't need to escape html comments as the browser will interpret it as part of the javascript. Perhaps if some older browsers don't interpret script tags default to the type of text/javascript
correctly (language="javascript"
which is deprecated) adding in type='text/javascript'
may be necessary.
Based on Mike Samuel's answer here I may have been wrong about not needing to escape html comments. However I was not able to reproduce it in chrome or chromium.
回答2:
Assuming that you're doing this:
Payload is set to
var data = '[this is user controlled data]';
and the rest of the code (assignment, quotes and semi-colon) is generated by your application, then the encoding you want is hex entity encoding.
See the OWASP XSS Prevention Cheat Sheet, Rule #3 for more information. This will convert
</script><script>alert("Muahahaha!")
into
var data = '\x3c\x2fscript\x3e\x3cscript\x3ealert\x28\x22Muahahaha\x21\x22\x29';
Try this and you will see this has the advantage of storing the user set string exactly correct, no matter what characters it contains. Additionally it takes care of single and double quote encoding. As a super bonus, it is also suitable for storing in HTML attributes:
<a onclick="alert('[user data]');" />
which normally would have to be HTML encoded again for correct display (because &
inside an HTML attribute is interpreted as &
). However, hex entity encoding does not include any HTML characters with special meaning so you get two for the price of one.
Update from comments
The OP indicated that the server-side code would be generated in the form
var data = <%= JSON.stringify(data) %>;
The above still applies. It is upto the JSON class to properly hex entity encode values as they're inserted into the JSON. This cannot easily be done outside of the class as you'd have to effectively parse the JSON again to determine the current language context. I wouldn't recommend going for the simple option of escaping the forward slash in the </script>
because there are other sequences that can end the grammar context such as CDATA closing tags. Escape properly and your code will be future proof and secure.
来源:https://stackoverflow.com/questions/32803709/sanitize-script-element-contents