I\'ve got the common situation where I\'ve got user input that uses a subset of HTML (input with tinyMCE). I need to have some server-side protection against XSS attacks an
I had the exact same problem a few years back when I was using TinyMCE.
There still doesn't seem to be any decent XSS / HTML white-listing solutions for .Net so I've uploaded a solution I created and have been using for a few years.
http://www.codeproject.com/KB/aspnet/html-white-listing.aspx
The white list defnintion is based on TinyMCE's valid-elements.
Take Two: Looking around, Microsoft have recently released a white-list based Anti-XSS Library (V3.0), check that out:
The Microsoft Anti-Cross Site Scripting Library V3.0 (Anti-XSS V3.0) is an encoding library designed to help developers protect their ASP.NET web-based applications from XSS attacks. It differs from most encoding libraries in that it uses the white-listing technique -- sometimes referred to as the principle of inclusions -- to provide protection against XSS attacks. This approach works by first defining a valid or allowable set of characters, and encodes anything outside this set (invalid characters or potential attacks). The white-listing approach provides several advantages over other encoding schemes. New features in this version of the Microsoft Anti-Cross Site Scripting Library include: - An expanded white list that supports more languages - Performance improvements - Performance data sheets (in the online help) - Support for Shift_JIS encoding for mobile browsers - A sample application - Security Runtime Engine (SRE) HTTP module
We are using the HtmlSanitizer .Net library, which:
Also on NuGet
Microsoft has an open-source library to protect against XSS: AntiXSS.
https://github.com/Vereyon/HtmlRuleSanitizer exactly solves this problem.
I had this challenge when integrating the wysihtml5 editor in an ASP.NET MVC application. I noted that it had a very nice yet simple white list based sanitizer which used rules to allow a subset of HTML to pass through. I implemented a server side version of it which depends on the HtmlAgility pack for parsing.
Microsoft Web Protection Library (former AntiXSS) seems to simply rip out almost all HTML tags and from what I read you cannot easily tailor the rules to the HTML subset you want to use. So that was not an option for me.
This HTML sanitizer also looks very promising and would be my second choice.
http://www.microsoft.com/en-us/download/details.aspx?id=28589 You can download a version here, but I linked it for the useful DOCX file. My preferred method is to use the NuGet package manager to get the latest AntiXSS package.
You can use the HtmlSanitizationLibrary assembly found in the 4.x AntiXss library. Note that GetSafeHtml() is in the HtmlSanitizationLibrary, under Microsoft.Security.Application.Sanitizer.
Well if you want to parse, and you're worried about invalid (x)HTML coming in then the HTML Agility Pack is probably the best thing to use for parsing. Remember though it's not just elements, but also attributes on allowed elements you need to allow (of course you should work to an allowed whitelist of elements and their attributes, rather than try to strip things that might be dodgy via a blacklist)
There's also the OWASP AntiSamy Project which is an ongoing work in progress - they also have a test site you can try to XSS
Regex for this is probably too risky IMO.