I believe the definition and implementation of Java\'s URI.resolve method is incompatible with RFC 3986 section 5.2.2. I understand that the Java API defines how that method
Yes, I agree that the URI.resolve(URI)
method is incompatible with RFC 3986. The original question, on its own, presents a fantastic amount of research that contributes to this conclusion. First, let's clear up any confusion.
As Raedwald explained (in a now deleted answer), there is a distinction between base paths that end or do not end with /
:
fizz
relative to /foo/bar
is: /foo/fizz
fizz
relative to /foo/bar/
is: /foo/bar/fizz
While correct, it's not a complete answer because the original question is not asking about a path (i.e. "fizz", above). Instead, the question is concerned with the separate query component of the relative URI reference. The URI class constructor used in the example code accepts five distinct String arguments, and all but the queryString
argument were passed as null
. (Note that Java accepts a null String as the path parameter and this logically results in an "empty" path component because "the path component is never undefined" though it "may be empty (zero length)".) This will be important later.
In an earlier comment, Sajan Chandran pointed out that the java.net.URI class is documented to implement RFC 2396 and not the subject of the question, RFC 3986. The former was obsoleted by the latter in 2005. That the URI class Javadoc does not mention the newer RFC could be interpreted as more evidence of its incompatibility. Let's pile on some more:
JDK-6791060 is an open issue that suggests this class "should be updated for RFC 3986". A comment there warns that "RFC3986 is not completely backwards compatible with 2396".
Previous attempts were made to update parts of the URI class to be compliant with RFC 3986, such as JDK-6348622, but were then rolled back for breaking backwards compatibility. (Also see this discussion on the JDK mailing list.)
Although the path "merge" logic sounds similar, as noted by SubOptimal, the pseudocode specified in the newer RFC does not match the actual implementation. In the pseudocode, when the relative URI's path is empty, then the resulting target path is copied as-is from the base URI. The "merge" logic is not executed under those conditions. Contrary to that specification, Java's URI implementation trims the base path after the last /
character, as observed in the question.
There are alternatives to the URI class, if you want RFC 3986 behavior. Java EE 6 implementations provide javax.ws.rs.core.UriBuilder, which (in Jersey 1.18) seems to behave as you expected (see below). It at least claims awareness of the RFC as far as encoding different URI components is concerned.
Outside of J2EE, Spring 3.0 introduced UriUtils, specifically documented for "encoding and decoding based on RFC 3986". Spring 3.1 deprecated some of that functionality and introduced the UriComponentsBuilder, but it does not document adherence to any specific RFC, unfortunately.
Test program, demonstrating different behaviors:
import java.net.*;
import java.util.*;
import java.util.function.*;
import javax.ws.rs.core.UriBuilder; // using Jersey 1.18
public class StackOverflow22203111 {
private URI withResolveURI(URI base, String targetQuery) {
URI reference = queryOnlyURI(targetQuery);
return base.resolve(reference);
}
private URI withUriBuilderReplaceQuery(URI base, String targetQuery) {
UriBuilder builder = UriBuilder.fromUri(base);
return builder.replaceQuery(targetQuery).build();
}
private URI withUriBuilderMergeURI(URI base, String targetQuery) {
URI reference = queryOnlyURI(targetQuery);
UriBuilder builder = UriBuilder.fromUri(base);
return builder.uri(reference).build();
}
public static void main(String... args) throws Exception {
final URI base = new URI("http://example.com/something/more/long");
final String queryString = "query=http://local:282/rand&action=aaaa";
final String expected =
"http://example.com/something/more/long?query=http://local:282/rand&action=aaaa";
StackOverflow22203111 test = new StackOverflow22203111();
Map<String, BiFunction<URI, String, URI>> strategies = new LinkedHashMap<>();
strategies.put("URI.resolve(URI)", test::withResolveURI);
strategies.put("UriBuilder.replaceQuery(String)", test::withUriBuilderReplaceQuery);
strategies.put("UriBuilder.uri(URI)", test::withUriBuilderMergeURI);
strategies.forEach((name, method) -> {
System.out.println(name);
URI result = method.apply(base, queryString);
if (expected.equals(result.toString())) {
System.out.println(" MATCHES: " + result);
}
else {
System.out.println(" EXPECTED: " + expected);
System.out.println(" but WAS: " + result);
}
});
}
private URI queryOnlyURI(String queryString)
{
try {
String scheme = null;
String authority = null;
String path = null;
String fragment = null;
return new URI(scheme, authority, path, queryString, fragment);
}
catch (URISyntaxException syntaxError) {
throw new IllegalStateException("unexpected", syntaxError);
}
}
}
Outputs:
URI.resolve(URI)
EXPECTED: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
but WAS: http://example.com/something/more/?query=http://local:282/rand&action=aaaa
UriBuilder.replaceQuery(String)
MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
UriBuilder.uri(URI)
MATCHES: http://example.com/something/more/long?query=http://local:282/rand&action=aaaa
If you want better1 behavior from URI.resolve()
and do not want to include another large dependency2 in your program, then I found the following code to work well within my requirements:
public URI resolve(URI base, URI relative) {
if (Strings.isNullOrEmpty(base.getPath()))
base = new URI(base.getScheme(), base.getAuthority(), "/",
base.getQuery(), base.getFragment());
if (Strings.isNullOrEmpty(uri.getPath()))
uri = new URI(uri.getScheme(), uri.getAuthority(), base.getPath(),
uri.getQuery(), uri.getFragment());
return base.resolve(uri);
}
The only non-JDK thing there is Strings
from Guava, for readability - replace with your own 1-line-method if you don't have Guava.
for me there is no discrepancy. With the Java behaviour.
in RFC2396 5.2.6a
All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.
in RFC3986 5.2.3
return a string consisting of the reference's path component appended to all but the last segment of the base URI's path (i.e., excluding any characters after the right-most /" in the base URI path, or excluding the entire base URI path if it does not contain any "/" characters).