问题
There should be a significant improvement of performance using multiplexing http2 feature when uploading multiple files.
And Java has an httpclient which supports natively the HTTP/2 protocol, so given that I tried to wrote the code for my own understanding.
This task seems to be not easy as I thought initially, or on the other side it seems that I wasn't able to find a server able to use Multiplexing in upload (if exists).
This is the code I wrote, anyone has thoughts about?
HttpClient httpClient = HttpClient.newBuilder().version(HttpClient.Version.HTTP_2).build();
String url = "https://your-own-http2-server.com/incoming-files/%s";
Path basePath = Path.of("/path/to/directory/where/is/a/bunch/of/jpgs");
Function<Path, CompletableFuture<HttpResponse<String>>> handleFile = file -> {
String currentUrl = String.format(url, file.getFileName().toString());
try {
HttpRequest request = HttpRequest.newBuilder()
.uri(URI.create(currentUrl))
.header("Content-Type", "image/jpeg")
.PUT(HttpRequest.BodyPublishers.ofFile(file))
.build();
return httpClient.sendAsync(request, HttpResponse.BodyHandlers.ofString());
} catch (IOException e) {
e.printStackTrace();
throw new RuntimeException(e);
}
};
List<Path> files = Files.list(basePath).collect(toList());
files.parallelStream().map(handleFile).forEach(c -> {
try {
final HttpResponse<String> response = c.get();
System.out.println(response.statusCode());
} catch (Exception e) {
e.printStackTrace();
throw new RuntimeException((e));
}
});
回答1:
There should be a significant improvement of performance using multiplexing http2 feature when uploading multiple files.
That is an assumption that is generally wrong.
Let's discard the case where you have multiple HTTP/1.1 connections so you can upload in parallel.
We then have 1 TCP connection and we want to compare the upload with HTTP/1.1 and HTTP/2.
In HTTP/1.1, the requests will be serialized one after the other, so the end time of the multiple uploads depends on the bandwidth of the connection (ignoring TCP slow start).
In HTTP/2, the requests will be interleaved by multiplexing. However, the data that needs to be sent is the same, so again the end time of the multiple uploads depend on the bandwidth of the connection.
In HTTP/1.1 you will have upload1.start...upload1.end|upload2.start...upload2.end|upload3.start...upload3.end
etc.
In HTTP/2 you will have upload1.start|upload2.start|upload3.start.....upload3.end..upload1.end..upload2.end
The end time would be the same.
The problem with HTTP/2 is that you are typically not limited by the bandwidth of the connection, but by the HTTP/2 flow control window, which is typically much, much, smaller.
The HTTP/2 specification defaults the HTTP/2 flow control window at 65535 bytes, which means that every 65535 bytes the client must stop sending data until the server acknowledges those bytes. This may take a roundtrip, so even if the roundtrip is small (e.g. 50 ms) for large file uploads you may be paying this roundtrip multiple times, adding seconds to your uploads (e.g. for a 6 MiB upload you may be paying this cost 100 times, which is 5 seconds).
It is then very important that you configure the server with a large HTTP/2 flow control window, especially if your server is used for file uploads. A large HTTP/2 flow control window on the server means that the server must be prepared to buffer a large amount of bytes, which means that a HTTP/2 server that handles primarily file uploads will need more memory than a HTTP/1.1 server.
With larger HTTP/2 flow control windows, the server may be smart and send acknowledgements to the client while the client is still uploading.
When a client uploads, it reduces its "send" window. By receiving acknowledgements from the server, the client enlarges the "send" window.
A typical bad interaction would be, indicating the client "send" window value, starting at 1 MiB:
[client send window]
1048576
client sends 262144 bytes
786432
client sends 262144 bytes
524288
client sends 262144 bytes
262144
client sends 262144 bytes
0
client cannot send
.
. (stalled)
.
client receives acknowledgment from server (524288 bytes)
524288
client sends 262144 bytes
262144
client sends 262144 bytes
0
client cannot send
.
. (stalled)
.
A good interaction would be:
[client send window]
1048576
client sends 262144 bytes
786432
client sends 262144 bytes
524288
client sends 262144 bytes
262144
client receives acknowledgment from server (524288 bytes)
786432
client sends 262144 bytes
524288
client sends 262144 bytes
262144
client receives acknowledgment from server (524288 bytes)
786432
As you can see in the good interaction, the server is acknowledging the client before the client exhausts the "send" window, so the client can keep sending at full speed.
Multiplexing is really effective for many small requests, which is the browser use case: many small GET requests (with no request content) that can be multiplexed in HTTP/2, arriving to the server way before than the correspondent HTTP/1.1 requests, and as such will be served earlier and arrive back to the browser earlier.
For large requests, as it's the case of file upload, HTTP/2 can be as efficient as HTTP/1.1, but I would not be surprised if the default configuration of the server makes it much less performant than HTTP/1.1 - HTTP/2 will require some tuning of the server configuration.
The HTTP/2 flow control window could get in the way also for downloads, so downloading large contents from a server over HTTP/2 may be really slow (for the same reasons explained above).
Browsers avoid this issue by telling the server to have a server "send" window really large - Firefox 72 sets it at 12 MiB per connection, and are very smart at acknowledging the server so that it will not stall the downloads.
回答2:
The java.net.http.HttpClient
handles bytes supplied through the BodyPublisher
as raw body data, without any interpretation. To illustrate my point, whether you use HttpRequest.BodyPublishers::ofFile(Path)
or HttpRequest.BodyPublishers::ofByteArray(byte[])
is semantically irrelevant: what changes is simply how the bytes that will be transmitted to the remote party are obtained. In case of file upload - the server probably expects that the request body will be formatted in certain ways. It might also expect some specific headers to be transmitted with the request (such as Content-Type etc). The HttpClient will not do that magically for you. At this time, this something that is not offered out of the box by the API. You would need to implement it at the caller level.
(There is a RFE logged for investigating support for multipart/form-data but it has not made it in the API yet https://bugs.openjdk.java.net/browse/JDK-8235761).
来源:https://stackoverflow.com/questions/60098299/how-to-use-multiplexing-http2-feature-when-uploading