HTTP URL Address Encoding in Java

前端 未结 26 1386
醉酒成梦
醉酒成梦 2020-11-22 01:35

My Java standalone application gets a URL (which points to a file) from the user and I need to hit it and download it. The problem I am facing is that I am not able to encod

相关标签:
26条回答
  • 2020-11-22 02:12

    I read the previous answers to write my own method because I could not have something properly working using the solution of the previous answers, it looks good for me but if you can find URL that does not work with this, please let me know.

    public static URL convertToURLEscapingIllegalCharacters(String toEscape) throws MalformedURLException, URISyntaxException {
                URL url = new URL(toEscape);
                URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
                //if a % is included in the toEscape string, it will be re-encoded to %25 and we don't want re-encoding, just encoding
                return new URL(uri.toString().replace("%25", "%"));
    }
    
    0 讨论(0)
  • 2020-11-22 02:13

    I took the content above and changed it around a bit. I like positive logic first, and I thought a HashSet might give better performance than some other options, like searching through a String. Although, I'm not sure if the autoboxing penalty is worth it, but if the compiler optimizes for ASCII chars, then the cost of boxing will be low.

    /***
     * Replaces any character not specifically unreserved to an equivalent 
     * percent sequence.
     * @param s
     * @return
     */
    public static String encodeURIcomponent(String s)
    {
        StringBuilder o = new StringBuilder();
        for (char ch : s.toCharArray()) {
            if (isSafe(ch)) {
                o.append(ch);
            }
            else {
                o.append('%');
                o.append(toHex(ch / 16));
                o.append(toHex(ch % 16));
            }
        }
        return o.toString();
    }
    
    private static char toHex(int ch)
    {
        return (char)(ch < 10 ? '0' + ch : 'A' + ch - 10);
    }
    
    // https://tools.ietf.org/html/rfc3986#section-2.3
    public static final HashSet<Character> UnreservedChars = new HashSet<Character>(Arrays.asList(
            'A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z',
            'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z',
            '0','1','2','3','4','5','6','7','8','9',
            '-','_','.','~'));
    public static boolean isSafe(char ch)
    {
        return UnreservedChars.contains(ch);
    }
    
    0 讨论(0)
  • 2020-11-22 02:14

    a solution i developed and much more stable than any other:

    public class URLParamEncoder {
    
        public static String encode(String input) {
            StringBuilder resultStr = new StringBuilder();
            for (char ch : input.toCharArray()) {
                if (isUnsafe(ch)) {
                    resultStr.append('%');
                    resultStr.append(toHex(ch / 16));
                    resultStr.append(toHex(ch % 16));
                } else {
                    resultStr.append(ch);
                }
            }
            return resultStr.toString();
        }
    
        private static char toHex(int ch) {
            return (char) (ch < 10 ? '0' + ch : 'A' + ch - 10);
        }
    
        private static boolean isUnsafe(char ch) {
            if (ch > 128 || ch < 0)
                return true;
            return " %$&+,/:;=?@<>#%".indexOf(ch) >= 0;
        }
    
    }
    
    0 讨论(0)
  • 2020-11-22 02:14

    How about:

    public String UrlEncode(String in_) {

    String retVal = "";
    
    try {
        retVal = URLEncoder.encode(in_, "UTF8");
    } catch (UnsupportedEncodingException ex) {
        Log.get().exception(Log.Level.Error, "urlEncode ", ex);
    }
    
    return retVal;
    

    }

    0 讨论(0)
  • 2020-11-22 02:15

    This is more of a note than an answer. For the above, commons http client URIUtil is still the most convenient and straightforward method to encode different parts of an URI. Unfortunately it is deprecated, the reason cannot be immediately ascertained. Though URIUtil may not cover all the corner cases, it is still the most uncomplicated approach.

    I have searched for many library/methods to do this, however, none of them (my view here) provided a simple approach.

    I finally took the URIUtil and its dependency code and recompiled, which is working very well for me.

    While I don't expect anyone to follow this approach, however, if someone requires, below are the dependencies for compilation (from commons http client 3):

    1. org.apache.commons.httpclient.URIException
    2. org.apache.commons.httpclient.HttpClientError
    3. org.apache.commons.httpclient.NameValuePair
    4. org.apache.commons.httpclient.util.LangUtils
    5. org.apache.commons.httpclient.util.URIUtil
    6. org.apache.commons.httpclient.URI
    7. org.apache.commons.httpclient.util.EncodingUtil
    0 讨论(0)
  • 2020-11-22 02:18

    I've created a new project to help construct HTTP URLs. The library will automatically URL encode path segments and query parameters.

    You can view the source and download a binary at https://github.com/Widen/urlbuilder

    The example URL in this question:

    new UrlBuilder("search.barnesandnoble.com", "booksearch/first book.pdf").toString()
    

    produces

    http://search.barnesandnoble.com/booksearch/first%20book.pdf

    0 讨论(0)
提交回复
热议问题