I want to know if there is any way to parse an URL like this:
https://www.mysite.com/lot/of/unpleasant/folders/and/my/url with spaces &\"others\".xls
The problem with that sort of urls is that they are partially encoded, if you try to use an out-of-the-box encoder it will always encode the whole string, so I guess that your approach of using a custom encoder is correct. Your code is OK, you would just need to add some validations like, for instance, what if the "evil url" doesn't come with the protocol part (i. e. without the "https://") unless you're pretty sure it will never happen.
I have some spare time so I did an alternative custom encoder, the strategy I follow is to parse for chars that are not allowed in an URL and encode only those, rather than trying to re-encode the whole thing:
private static String encodeSemiEncoded(String semiEncondedUrl) {
final String ALLOWED_CHAR = "!*'();:@&=+$,/?#[]-_.~";
StringBuilder encoded = new StringBuilder();
for(char ch: semiEncondedUrl.toCharArray()) {
boolean shouldEncode = ALLOWED_CHAR.indexOf(ch) == -1 && !Character.isLetterOrDigit(ch) || ch > 127;
if(shouldEncode) {
encoded.append(String.format("%%%02X", (int)ch));
} else {
encoded.append(ch);
}
}
return encoded.toString();
}
Hope this helps