How to encode a URL to be “browserable”?

后端 未结 1 1033
执笔经年
执笔经年 2021-01-21 05:39

I want to know if there is any way to parse an URL like this:

https://www.mysite.com/lot/of/unpleasant/folders/and/my/url with spaces &\"others\".xls
         


        
相关标签:
1条回答
  • 2021-01-21 06:16

    The problem with that sort of urls is that they are partially encoded, if you try to use an out-of-the-box encoder it will always encode the whole string, so I guess that your approach of using a custom encoder is correct. Your code is OK, you would just need to add some validations like, for instance, what if the "evil url" doesn't come with the protocol part (i. e. without the "https://") unless you're pretty sure it will never happen.

    I have some spare time so I did an alternative custom encoder, the strategy I follow is to parse for chars that are not allowed in an URL and encode only those, rather than trying to re-encode the whole thing:

    private static String encodeSemiEncoded(String semiEncondedUrl) {
        final String ALLOWED_CHAR = "!*'();:@&=+$,/?#[]-_.~";
        StringBuilder encoded = new StringBuilder();
        for(char ch: semiEncondedUrl.toCharArray()) {
            boolean shouldEncode = ALLOWED_CHAR.indexOf(ch) == -1 && !Character.isLetterOrDigit(ch) || ch > 127;
            if(shouldEncode) {
                encoded.append(String.format("%%%02X", (int)ch));
            } else {
                encoded.append(ch);
            }
        }
        return encoded.toString();
    }
    

    Hope this helps

    0 讨论(0)
提交回复
热议问题