Downloading multiple files in R with variable length, nested URLs

后端 未结 3 1670
青春惊慌失措
青春惊慌失措 2021-01-20 10:58

New member here. Trying to download a large number of files from a website in R (but open to suggestions as well, such as wget.)

From this post, I understand I must

相关标签:
3条回答
  • 2021-01-20 11:20

    This should do the job:

    agency <- c("FAA", "DEA", "NTSB")
    states <- c("AL", "AK", "AZ", "AR")
    
    URLs <-
    paste0("http://website.gov/",
           rep(agency, length(agency)),
           "_",
           rep(states, length(states)),
           ".zip")
    

    Then loop through the URLs vector to pull the zip files. It will be faster if you use an apply function.

    0 讨论(0)
  • 2021-01-20 11:38

    If all your agency codes are the same within each state code you could use the below to create your vector of urls to loop through. (You will also need a vector of destinations the same size).

    #Getting all combinations
    States <- c("AA","BB")
    Agency <- c("ABCDEFG","HIJKLMN")
    AllCombinations <- expand.grid(States, Agency)
    AllCombinationsVec <- paste0("http://website.gov/" ,AllCombinations$Var1, "_",AllCombinations$Var2,".zip" )
    

    You can then try looping through each file something like this:

    #loop method
    
    for(i in seq(AllCombinationsVec)){
      download.file(AllCombinationsVec[i], destinations[i], mode="wb")}
    

    This is also another way of looping through items apply functions will apply a function to every item in a list or vector.

    #lapply method
    
    mapply(function(x, y) download.file(x,y, mode="wb"),x = AllCombinationsVec, y = destinations)
    
    0 讨论(0)
  • 2021-01-20 11:41

    This will download them in batches and take advantage of the speedier simultaneous downloading capabilities of download.file() if the libcurl option is available on your installation of R:

    library(purrr)
    
    states <- state.abb[1:27]
    agencies <- c("AID", "AMBC", "AMTRAK", "APHIS", "ATF", "BBG", "DOJ", "DOT",
                  "BIA", "BLM", "BOP", "CBFO", "CBP", "CCR", "CEQ", "CFTC", "CIA",
                  "CIS", "CMS", "CNS", "CO", "CPSC", "CRIM", "CRT", "CSB", "CSOSA",
                  "DA", "DEA", "DHS", "DIA", "DNFSB", "DOC", "DOD", "DOE", "DOI")
    
    walk(states, function(x) {
       map(x, ~sprintf("http://website.gov/%s_%s.zip", ., agencies)) %>% 
        flatten_chr() -> urls
        download.file(urls, basename(urls), method="libcurl")
    }) 
    
    0 讨论(0)
提交回复
热议问题