HttpClient problem with URLs which include curly braces

后端 未结 2 982
情书的邮戳
情书的邮戳 2020-12-19 10:36

I am using HttpClient for my android application. At some point, I have to fetch data from remote locations. Below is the snippet how I made use of HttpClient to get the res

相关标签:
2条回答
  • 2020-12-19 10:50

    The strict answer is that you should never have curly braces in your URL

    A full description of valid URL's can be found in RFC1738

    The pertinent part for this answer is as follows

    Unsafe:

    Characters can be unsafe for a number of reasons. The space
    character is unsafe because significant spaces may disappear and
    insignificant spaces may be introduced when URLs are transcribed or
    typeset or subjected to the treatment of word-processing programs.
    The characters "<" and ">" are unsafe because they are used as the
    delimiters around URLs in free text; the quote mark (""") is used to
    delimit URLs in some systems. The character "#" is unsafe and should always be encoded because it is used in World Wide Web and in other
    systems to delimit a URL from a fragment/anchor identifier that might follow it. The character "%" is unsafe because it is used for
    encodings of other characters. Other characters are unsafe because
    gateways and other transport agents are known to sometimes modify
    such characters. These characters are "{", "}", "|", "\", "^", "~",
    "[", "]", and "`".

    All unsafe characters must always be encoded within a URL. For
    example, the character "#" must be encoded within URLs even in
    systems that do not normally deal with fragment or anchor
    identifiers, so that if the URL is copied into another system that
    does use them, it will not be necessary to change the URL encoding.

    In order to bypass the problem you have been experiencing you must encode your url.

    The problem you experienced with the "host may not be null" error will happen when the entire url is being encoded including the https://mydomain.com/ part so it gets confused. You only want to encode the last part of the URL called the path.

    The solution is to use the Uri.Builder class to build your URI from the individual parts which should encode the path in the process

    You will find a detailed description in the Android SDK Uri.Builder reference documentation

    Some trivial examples using your values are:

    Uri.Builder b = Uri.parse("https://mydomain.com").buildUpon();
    b.path("/abc/{5D/{B0blhahblah-blah}I1.jpg");
    Uri u = b.build();
    

    Or you can use chaining:

        Uri u = Uri.parse("https://mydomain.com").buildUpon().path("/abc/{5D/{B0blhahblah-blah}I1.jpg").build();
    
    0 讨论(0)
  • 2020-12-19 11:01

    Except RFC1738 has been obsolete for over a decade, has been superseded by rfc3986 and there is no indication in:

    https://tools.ietf.org/html/rfc3986

    That curly braces are unsafe (In fact, the RFC does not contain a single curly brace character anywhere). Furthermore, I've tried URI's in browsers that contain curly braces, and they work fine.

    Also note the OP is using a class called URI - which should definitely be following 3986, at the very least, if not 3987.

    However, oddly, IRIs defined in:

    https://tools.ietf.org/html/rfc3987

    Have the note that:

    Systems accepting IRIs MAY also deal with the printable characters in US-ASCII that are not allowed in URIs, namely "<", ">", '"', space, "{", "}", "|", "\", "^", and "`", in step 2 above. If these characters are found but are not converted, then the conversion
    SHOULD fail. Please note that the number sign ("#"), the percent
    sign ("%"), and the square bracket characters ("[", "]") are not part of the above list and MUST NOT be converted.

    In other words, it looks like the RFCs themselves have some issues.

    0 讨论(0)
提交回复
热议问题