问题
I am implementing an XSS filter for my web application and also using the ESAPI encoder to sanitise the input.
The patterns I am using are as given below,
// Script fragments
Pattern.compile("<script>(.*?)</script>", Pattern.CASE_INSENSITIVE),
// src='...'
Pattern.compile("src[\r\n]*=[\r\n]*\\\'(.*?)\\\'", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
Pattern.compile("src[\r\n]*=[\r\n]*\\\"(.*?)\\\"", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// lonely script tags
Pattern.compile("</script>", Pattern.CASE_INSENSITIVE),
Pattern.compile("<script(.*?)>", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// eval(...)
Pattern.compile("eval\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// expression(...)
Pattern.compile("expression\\((.*?)\\)", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL),
// javascript:...
Pattern.compile("javascript:", Pattern.CASE_INSENSITIVE),
// vbscript:...
Pattern.compile("vbscript:", Pattern.CASE_INSENSITIVE),
// onload(...)=...
Pattern.compile("onload(.*?)=", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE | Pattern.DOTALL)
But, still a few script are not getting filtered specially the one which are appended to a parameter like
url?sourceId=abx;alert('hello');
How do I handle these?
回答1:
This isn't the right approach. It's mathematically impossible to write a regex capable of correctly punting XSS. (Regex is "regular" but HTML and Javascript are both context-free grammars.)
You can however guarantee that when you switch contexts, (hand off a piece of data that is going to be interpreted) that the data is correctly escaped for that context switch. So, when sending data to a browser, escape it for HTML if its being handled as HTML or as Javascript if its being handled by javascript.
If you DO need to allow HTML/javascript into your application, then you'll want a web-application firewall or a framework like HDIV.
回答2:
You can combine ESAPI and JSoup to clear out all the XSS vulnerabilities. I would definitely avoid trying to manually write all the regexes when other libraries are built to handle this for you.
Here is an XSS filter implementation for Jersey 2.x: How to Modify QueryParam and PathParam in Jersey 2
来源:https://stackoverflow.com/questions/31308968/xss-filter-to-remove-all-scripts