问题
I have a probably rather basic question to using the download.file
function in R using the wget
option and employing some of the wget extra options, but I just cannot get it to work.
What I want to do: download a local copy of a webpage (actually several webpages, but for now the challenge is to get it to work even with 1).
Challenge: I need the local copy to look exactly like the online version, which also means to include links/ icons, etc.. I found wget to be a good tool for this and I would like to specify some of the extra options, such as --random wait
, -p
, -r
. I found some very helpful tutorials on this, however none of them employed the extra options in R, but rather in wget directly.
So here is the code I have put together for this:
download.file('https://www.wikipedia.org/', destfile = "wikipage", method = "wget", extra = getOption("--random wait", "-r", "-p"))
which does not work. I suspect there are problems with both, the "wget" method and the specification of the extras.
Can anyone help, it would be much appreciated?
A bonus question: I know that the destfile
is supposed to specify a file name for the downloaded document, but is there any way I could specify a folder through a path to which all downloaded files should be saved?
Thank you in advance!
Best Carolin
回答1:
You can specify multiple options directly in the extra argument, without getOption()
.
Further, you can simply include the path to the file where you want to save your downloaded file in the destfile
.
download.file('https://www.wikipedia.org/', destfile = "mydirectory/wikipage.html", method = "wget", extra = "-r -p --random-wait")
You will, however, have the problem that it will attempt to save all downloaded items into the same destfile
.
Note that there was a similar question a while ago (I saw that only now). The suggested solution was to use system()
instead of download.file
to run the wget command. Adapted to your problem:
setwd("./mydirectory")
system("wget http://www.wikipedia.org -p -k --random-wait")
Edit: Please also note that both commands will only work on systems with wget installed. On Linux/BSD/Mac, the package to install should usually be called wget. On Windows, wget is (according to the download.file() help) available from packages like gnuwin32 and Cygwin. In this case, the system()
command may still not work if the system does not know where the wget executable is. You may, in this case, need to specify the absolute path to the wget executable.
来源:https://stackoverflow.com/questions/50293314/r-download-file-with-wget-method-and-specifying-extra-wget-options