How can I more efficently download large files over http?

旧街凉风 提交于 2021-01-24 08:01:07

问题


I'm trying to download large files (<1GB) in Kotlin since I already knew I'm using okhttp and pretty much followed just used the answer from this question. Except that I'm using Kotlin instead of java, so the syntax is slightly diffrent.

val client = OkHttpClient()
val request = Request.Builder().url(urlString).build()
val response = client.newCall(request).execute()

val is = response.body().byteStream()

val input = BufferedInputStream(is)
val output = FileOutputStream(file)

val data = ByteArray(1024)
val total = 0L
val count : Int
do {
    count = input.read(data)
    total += count
    output.write(data, 0, count)
} while (count != -1)

output.flush()
output.close()
input.close()

That works in that it downloads the file without using too much memory but it seems needlessly ineffective in that it constantly tries to write more data without knowing if any new data arrived. That also seems confirmed with my own tests while running this on a very resource limited VM as it seems to use more CPU while getting a lower download speed then a comparable script in python, and of cause using wget.

What I'm wondering if there is a way where I can give something a callback that gets called if x bytes are available or if it's the end of the file so I don't have to constantly try and get more data without knowing if there is any.

Edit: If it's not possible with okhttp I don't have a problem using something else, it's just that it was the http library I'm used to.


回答1:


As of version 11, Java has a built-in HttpClient which implements

asynchronous streams of data with non-blocking back pressure

and that's what you need if you want your code to run only when there's data to process.

If you can afford to upgrade to Java 11, you'll be able to solve your problem out of the box, using the HttpResponse.BodyHandlers.ofFile body handler. You won't have to implement any data transfer logic on your own.

Kotlin example:

fun main(args: Array<String>) {    
    val client = HttpClient.newHttpClient()

    val request = HttpRequest.newBuilder()
            .uri(URI.create("https://www.google.com"))
            .GET()
            .build()

    println("Starting download...")
    client.send(request, HttpResponse.BodyHandlers.ofFile(Paths.get("google.html")))
    println("Done with download.")
}



回答2:


One could do away with the BufferedInputStream. Or as its default buffer size in Oracle's java is 8192, use a larger ByteArray, say 4096.

However best would be to either use java.nio or try Files.copy:

Files.copy(is, file.toPath());

This removes about 12 lines of code.

An other way is to send the request with a header to deflate gzip compression Accept-Encoding: gzip, so the transmission takes less time. In the response here then possibly wrap is in a new GZipInputStream(is) - when the response header Content-Encoding: gzip is given. Or if feasible store the file compressed with an addition ending .gz; mybiography.md as mybiography.md.gz.



来源:https://stackoverflow.com/questions/52943766/how-can-i-more-efficently-download-large-files-over-http

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!