New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)
From this post, I understand I must
This should do the job:
agency <- c("FAA", "DEA", "NTSB")
states <- c("AL", "AK", "AZ", "AR")
URLs <-
paste0("http://website.gov/",
rep(agency, length(agency)),
"_",
rep(states, length(states)),
".zip")
Then loop through the URLs
vector to pull the zip files. It will be faster if you use an apply function.
If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).
#Getting all combinations
States <- c("AA","BB")
Agency <- c("ABCDEFG","HIJKLMN")
AllCombinations <- expand.grid(States, Agency)
AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )
You can then try looping through each file something like this:
#loop method
for(i in seq(AllCombinationsVec)){
download.file(AllCombinationsVec[i], destinations[i], mode="wb")}
This is also another way of looping through items apply functions will apply a function to every item in a list or vector.
#lapply method
mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)
This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file()
if the libcurl
option is available on your installation of R:
library(purrr)
states <- state.abb[1:27]
agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
"BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
"CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
"DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")
walk(states, function(x) {
map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>%
flatten_chr() -> urls
download.file(urls, basename(urls), method="libcurl")
})