问题
I have a a dataframe that looks like this:
link <- c("http://www.sciencedirect.com/science/article/pii/S1042957318300366", "http://www.sciencedirect.com/science/article/pii/S1042957318300664", "http://www.sciencedirect.com/science/article/pii/S1042957318300627", "http://www.sciencedirect.com/science/article/pii/S002205311830156X", "http://www.sciencedirect.com/science/article/pii/S1090951618303419", "http://hdl.handle.net/10.1093/jjfinec/nby006")
repec_id <- c("RePEc:eee:jfinin:v:38:y:2019:i:c:p:19-44", "RePEc:eee:jfinin:v:38:y:2019:i:c:p:1-10", "RePEc:eee:jfinin:v:38:y:2019:i:c:p:58-68", "RePEc:eee:jetheo:v:182:y:2019:i:c:p:329-359", "RePEc:eee:worbus:v:54:y:2019:i:4:p:372-386", "RePEc:oup:jfinec:v:17:y:2019:i:3:p:462-494")
df <- data.frame(repec_id, link)
I have a loop that takes each of the links and downloads the file it leads to (or returns a warning/ error message if the link is broken). It looks like this:
urls <- df$link
output <- rep(NA, length(urls))
for (i in seq_along(urls)) {
output[i] <- tryCatch(
{download.file(urls[i], paste0('~/Desktop/Dataset/', basename(urls[i])))},
error = function(e) {NA},
warning = function(w) {NA}
)
}
However, rather than naming the file using the basename
function, I would like to assign the matching repec_id
followed by the relevant extension (e.g. '.pdf', '.txt', etc). In other words, I would like each file that I download to have the relevant repec_id
as its name:
list.files()
"RePEc:eee:jfinin:v:38:y:2019:i:c:p:19-44.pdf" "RePEc:eee:jfinin:v:38:y:2019:i:c:p:1-10.txt" "RePEc:eee:jfinin:v:38:y:2019:i:c:p:58-68.aspx" "RePEc:eee:jetheo:v:182:y:2019:i:c:p:329-359.pdf" "RePEc:eee:worbus:v:54:y:2019:i:4:p:372-386.pdf" "RePEc:oup:jfinec:v:17:y:2019:i:3:p:462-494.txt"
Does anyone know how I can do this? I'm a bit stuck. Thanks in advance for your help!
回答1:
We could write a function passing URL and name of the file.
download_my_file <- function(url, name) {
tryCatch(
{download.file(url, paste0('~/Desktop/Dataset/', name))},
error = function(e) {NA},
warning = function(w) {NA})
}
and use Map
to pass it for every link
and repec_id
.
Map(download_my_file, df$link, df$repec_id)
data
link
should be characters so using stringsAsFactors = FALSE
when creating dataframe.
df <- data.frame(repec_id, link, stringsAsFactors = FALSE)
来源:https://stackoverflow.com/questions/61240941/how-to-name-downloaded-files-using-strings-from-another-column-in-the-same-dataf