Java URI class: constructor determines whether or not query is encoded?

≡放荡痞女 提交于 2020-05-09 06:36:14

问题


Is this behavior intentional?

//create the same URI using two different constructors

URI foo = null, bar = null;
try { 
    //constructor: URI(uri string)
    foo = new URI("http://localhost/index.php?token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB");
} catch (URISyntaxException e) {} 
try { 
    //constructor: URI(scheme, authority, path, query, fragment) 
    bar = new URI("http", "localhost", "/index.php", "token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB", null);
} catch (URISyntaxException e) {}

//the output:
//foo.getQuery() = token=4/4EzdsSBg_4vX6D5pzvdsMLDoyItB
//bar.getQuery() = token=4%2F4EzdsSBg_4vX6D5pzvdsMLDoyItB

The URI(string uri) constructor seems to be decoding the query part of the URI. I thought the query portion is supposed to be encoded? And why doesn't the other constructor decode the query part?


回答1:


From the URI JavaDoc:

The single-argument constructor requires any illegal characters in its argument to be quoted and preserves any escaped octets and other characters that are present.

The multi-argument constructors quote illegal characters as required by the components in which they appear. The percent character ('%') is always quoted by these constructors. Any other characters are preserved.

Thus URI(String) expects you to encode everything correctly and assumes %2F is such an encoded octed which will be decoded to /.

The other constructors would endcode the % character (resulting in %252F for input %2F) and thus after decoding you still get %2F.

I assume the purpose of the deviation between the constructors is to allow things like new URI(otherUri.toString()) with toString() returning a fully encoded URI.




回答2:


A quick analysis:

foo

The constructor parses the input URI and unquotes the literal %2F to /. This is what we expect.

bar

With the constructor used in the bar example, the fragment part is taken as a raw String with illegal chars and encoded first, with the effect that %2F is translated to %252F. Then it is parsed and the now unquoted query part is (again) %2F.

Lesson learned: With the first constructor we pass an RFC 2396 compliant URI. The other constructors take normal Strings (unquoted illegal chars) and URI constructs an RFC 2396 compliant representation.

Here's a working example on IDEONE (with extra supporting output)



来源:https://stackoverflow.com/questions/5828722/java-uri-class-constructor-determines-whether-or-not-query-is-encoded

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!