Best way to handle security and avoid XSS with user entered URLs

北城以北 提交于 2019-11-26 03:35:57

问题


We have a high security application and we want to allow users to enter URLs that other users will see.

This introduces a high risk of XSS hacks - a user could potentially enter javascript that another user ends up executing. Since we hold sensitive data it\'s essential that this never happens.

What are the best practices in dealing with this? Is any security whitelist or escape pattern alone good enough?

Any advice on dealing with redirections (\"this link goes outside our site\" message on a warning page before following the link, for instance)

Is there an argument for not supporting user entered links at all?


Clarification:

Basically our users want to input:

stackoverflow.com

And have it output to another user:

<a href=\"http://stackoverflow.com\">stackoverflow.com</a>

What I really worry about is them using this in a XSS hack. I.e. they input:

alert(\'hacked!\');

So other users get this link:

<a href=\"alert(\'hacked!\');\">stackoverflow.com</a>

My example is just to explain the risk - I\'m well aware that javascript and URLs are different things, but by letting them input the latter they may be able to execute the former.

You\'d be amazed how many sites you can break with this trick - HTML is even worse. If they know to deal with links do they also know to sanitise <iframe>, <img> and clever CSS references?

I\'m working in a high security environment - a single XSS hack could result in very high losses for us. I\'m happy that I could produce a Regex (or use one of the excellent suggestions so far) that could exclude everything that I could think of, but would that be enough?


回答1:


If you think URLs can't contain code, think again!

https://www.owasp.org/index.php/XSS_Filter_Evasion_Cheat_Sheet

Read that, and weep.

Here's how we do it on Stack Overflow:

/// <summary>
/// returns "safe" URL, stripping anything outside normal charsets for URL
/// </summary>
public static string SanitizeUrl(string url)
{
    return Regex.Replace(url, @"[^-A-Za-z0-9+&@#/%?=~_|!:,.;\(\)]", "");
}



回答2:


The process of rendering a link "safe" should go through three or four steps:

  • Unescape/re-encode the string you've been given (RSnake has documented a number of tricks at http://ha.ckers.org/xss.html that use escaping and UTF encodings).
  • Clean the link up: Regexes are a good start - make sure to truncate the string or throw it away if it contains a " (or whatever you use to close the attributes in your output); If you're doing the links only as references to other information you can also force the protocol at the end of this process - if the portion before the first colon is not 'http' or 'https' then append 'http://' to the start. This allows you to create usable links from incomplete input as a user would type into a browser and gives you a last shot at tripping up whatever mischief someone has tried to sneak in.
  • Check that the result is a well formed URL (protocol://host.domain[:port][/path][/[file]][?queryField=queryValue][#anchor]).
  • Possibly check the result against a site blacklist or try to fetch it through some sort of malware checker.

If security is a priority I would hope that the users would forgive a bit of paranoia in this process, even if it does end up throwing away some safe links.




回答3:


Use a library, such as OWASP-ESAPI API:

  • PHP - http://code.google.com/p/owasp-esapi-php/
  • Java - http://code.google.com/p/owasp-esapi-java/
  • .NET - http://code.google.com/p/owasp-esapi-dotnet/
  • Python - http://code.google.com/p/owasp-esapi-python/

Read the following:

  • https://www.golemtechnologies.com/articles/prevent-xss#how-to-prevent-cross-site-scripting
  • https://www.owasp.org/
  • http://www.secbytes.com/blog/?p=253

For example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$esapi = new ESAPI( "/etc/php5/esapi/ESAPI.xml" ); // Modified copy of ESAPI.xml
$sanitizer = ESAPI::getSanitizer();
$sanitized_url = $sanitizer->getSanitizedURL( "user-homepage", $url );

Another example is to use a built-in function. PHP's filter_var function is an example:

$url = "http://stackoverflow.com"; // e.g., $_GET["user-homepage"];
$sanitized_url = filter_var($url, FILTER_SANITIZE_URL);

Using filter_var allows javascript calls, and filters out schemes that are neither http nor https. Using the OWASP ESAPI Sanitizer is probably the best option.

Still another example is the code from WordPress:

  • http://core.trac.wordpress.org/browser/tags/3.5.1/wp-includes/formatting.php#L2561

Additionally, since there is no way of knowing where the URL links (i.e., it might be a valid URL, but the contents of the URL could be mischievous), Google has a safe browsing API you can call:

  • https://developers.google.com/safe-browsing/lookup_guide

Rolling your own regex for sanitation is problematic for several reasons:

  • Unless you are Jon Skeet, the code will have errors.
  • Existing APIs have many hours of review and testing behind them.
  • Existing URL-validation APIs consider internationalization.
  • Existing APIs will be kept up-to-date with emerging standards.

Other issues to consider:

  • What schemes do you permit (are file:/// and telnet:// acceptable)?
  • What restrictions do you want to place on the content of the URL (are malware URLs acceptable)?



回答4:


Just HTMLEncode the links when you output them. Make sure you don't allow javascript: links. (It's best to have a whitelist of protocols that are accepted, e.g., http, https, and mailto.)




回答5:


You don't specify the language of your application, I will then presume ASP.NET, and for this you can use the Microsoft Anti-Cross Site Scripting Library

It is very easy to use, all you need is an include and that is it :)

While you're on the topic, why not given a read on Design Guidelines for Secure Web Applications

If any other language.... if there is a library for ASP.NET, has to be available as well for other kind of language (PHP, Python, ROR, etc)




回答6:


How about not displaying them as a link? Just use the text.

Combined with a warning to proceed at your own risk may be enough.

addition - see also Should I sanitize HTML markup for a hosted CMS? for a discussion on sanitizing user input




回答7:


In my project written in JavaScript I use this regex as white list:

 url.match(/^((https?|ftp):\/\/|\.{0,2}\/)/)

the only limitation is that you need to put ./ in front for files in same directory but I think I can live with that.




回答8:


For Pythonistas, try Scrapy's w3lib.

OWASP ESAPI pre-dates Python 2.7 and is archived on the now-defunct Google Code.




回答9:


You could use a hex code to convert the entire URL and send it to your server. That way the client would not understand the content in the first glance. After reading the content, you could decode the content URL = ? and send it to the browser.




回答10:


Allowing a URL and allowing JavaScript are 2 different things.



来源:https://stackoverflow.com/questions/205923/best-way-to-handle-security-and-avoid-xss-with-user-entered-urls

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!