Downloading large files with R/RCurl efficiently

白昼怎懂夜的黑 提交于 2019-12-12 07:09:46

问题


I see that many examples for downloading binary files with RCurl are like such:

library("RCurl")
curl = getCurlHandle()
bfile=getBinaryURL (
        "http://www.example.com/bfile.zip",
        curl= curl,
        progressfunction = function(down, up) {print(down)}, noprogress = FALSE
)
writeBin(bfile, "bfile.zip")
rm(curl, bfile)

If the download is very large, I suppose it would be better writing it concurrently to the storage medium, instead of fetching all in memory.

In RCurl documentation there are some examples to get files by chunks and manipulate them as they are downloaded, but they seem all referred to text chunks.

Can you give a working example?

UPDATE

A user suggests using the R native download file with mode = 'wb' option for binary files.

In many cases the native function is a viable alternative, but there are a number of use-cases where this native function does not fit (https, cookies, forms etc.) and this is the reason why RCurl exists.


回答1:


This is the working example:

library(RCurl)
#
f = CFILE("bfile.zip", mode="wb")
curlPerform(url = "http://www.example.com/bfile.zip", writedata = f@ref)
close(f)

It will download straight to file. The returned value will be (instead of the downloaded data) the status of the request (0, if no errors occur).

Mention to CFILE is a bit terse on RCurl manual. Hopefully in the future it will include more details/examples.

For your convenience the same code is packaged as a function (and with a progress bar):

bdown=function(url, file){
    library('RCurl')
    f = CFILE(file, mode="wb")
    a = curlPerform(url = url, writedata = f@ref, noprogress=FALSE)
    close(f)
    return(a)
}

## ...and now just give remote and local paths     
ret = bdown("http://www.example.com/bfile.zip", "path/to/bfile.zip")



回答2:


um.. use mode = 'wb' :) ..run this and follow along w/ my comments.

# create a temporary file and a temporary directory on your local disk
tf <- tempfile()
td <- tempdir()

# run the download file function, download as binary..  save the result to the temporary file
download.file(
    "http://sourceforge.net/projects/peazip/files/4.8/peazip_portable-4.8.WINDOWS.zip/download",
    tf ,
    mode = 'wb' 
)

# unzip the files to the temporary directory
files <- unzip( tf , exdir = td )

# here are your files
files


来源:https://stackoverflow.com/questions/14426359/downloading-large-files-with-r-rcurl-efficiently

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!