How to name downloaded files using strings from another column in the same dataframe?

让人想犯罪 __ 提交于 2021-02-11 15:51:57

问题


I have a a dataframe that looks like this:

link <- c("http://www.sciencedirect.com/science/article/pii/S1042957318300366", "http://www.sciencedirect.com/science/article/pii/S1042957318300664", "http://www.sciencedirect.com/science/article/pii/S1042957318300627", "http://www.sciencedirect.com/science/article/pii/S002205311830156X", "http://www.sciencedirect.com/science/article/pii/S1090951618303419", "http://hdl.handle.net/10.1093/jjfinec/nby006")
repec_id <- c("RePEc:eee:jfinin:v:38:y:2019:i:c:p:19-44", "RePEc:eee:jfinin:v:38:y:2019:i:c:p:1-10", "RePEc:eee:jfinin:v:38:y:2019:i:c:p:58-68", "RePEc:eee:jetheo:v:182:y:2019:i:c:p:329-359", "RePEc:eee:worbus:v:54:y:2019:i:4:p:372-386", "RePEc:oup:jfinec:v:17:y:2019:i:3:p:462-494")
df <- data.frame(repec_id, link)

I have a loop that takes each of the links and downloads the file it leads to (or returns a warning/ error message if the link is broken). It looks like this:

urls <- df$link
output <- rep(NA, length(urls))
for (i in seq_along(urls)) {
  output[i] <- tryCatch(
    {download.file(urls[i], paste0('~/Desktop/Dataset/', basename(urls[i])))}, 
    error = function(e) {NA},
    warning = function(w) {NA}
  )
}

However, rather than naming the file using the basename function, I would like to assign the matching repec_id followed by the relevant extension (e.g. '.pdf', '.txt', etc). In other words, I would like each file that I download to have the relevant repec_id as its name:

list.files()
"RePEc:eee:jfinin:v:38:y:2019:i:c:p:19-44.pdf" "RePEc:eee:jfinin:v:38:y:2019:i:c:p:1-10.txt"  "RePEc:eee:jfinin:v:38:y:2019:i:c:p:58-68.aspx" "RePEc:eee:jetheo:v:182:y:2019:i:c:p:329-359.pdf" "RePEc:eee:worbus:v:54:y:2019:i:4:p:372-386.pdf" "RePEc:oup:jfinec:v:17:y:2019:i:3:p:462-494.txt"

Does anyone know how I can do this? I'm a bit stuck. Thanks in advance for your help!


回答1:


We could write a function passing URL and name of the file.

download_my_file <- function(url, name) {
   tryCatch(
     {download.file(url, paste0('~/Desktop/Dataset/', name))}, 
      error = function(e) {NA},
      warning = function(w) {NA})
}

and use Map to pass it for every link and repec_id.

Map(download_my_file, df$link, df$repec_id)

data

link should be characters so using stringsAsFactors = FALSE when creating dataframe.

df <- data.frame(repec_id, link, stringsAsFactors = FALSE)


来源:https://stackoverflow.com/questions/61240941/how-to-name-downloaded-files-using-strings-from-another-column-in-the-same-dataf

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!