I want a user to be able to submit a url, and then display that url to other users as a link.
If I naively redisplay what the user submitted, I leave myself open to urls
You can use apache validator URLValidator
UrlValidator urlValidator = new UrlValidator(schemes);
if (urlValidator.isValid("http://somesite.com")) {
//valid
}
I think what you are looking for is output encoding. Have a look at OWASP ESAPI which is tried and tested way to perform encoding in Java.
Also, just a suggestion, if you want to check if a user is submitting malicious URL, you can check that against Google malware database. You can use SafeBrowing API for that.
URLs having '
in are perfectly valid. If you are outputting them to an HTML document without escaping, then the problem lies in your lack of HTML-escaping, not in the input checking. You need to ensure that you are calling an HTML encoding method every time you output any variable text (including URLs) into an HTML document.
Java does not have a built-in HTML encoder (poor show!) but most web libraries do (take your pick, or write it yourself with a few string replaces). If you use JSTL tags, you get escapeXml
to do it for free by default:
<a href="<c:out value="${link}"/>">ok</a>
Whilst your main problem is HTML-escaping, it is still potentially beneficial to validate that an input URL is valid to catch mistakes - you can do that by parsing it with new URL(...)
and seeing if you get a MalformedURLException.
You should also check that the URL begins with a known-good protocol such as http://
or https://
. This will prevent anyone using dangerous URL protocols like javascript:
which can lead to cross-site-scripting as easily as HTML-injection can.