Download files with specific extension from a website [closed]

问题

How can I download the content of a webpage and find all files with specific extension listed on it. And then download all of them. For example, I would like to download all netcdf files (with extension *.nc4) from the following webpage: https://data.giss.nasa.gov/impacts/agmipcf/agmerra/.

I was recommended to look into Rcurl package but could not find how to do this.

回答1:

library(stringr)

# Get the context of the page
thepage = readLines('https://data.giss.nasa.gov/impacts/agmipcf/agmerra/')

# Find the lines that contain the names for netcdf files
nc4.lines <- grep('*.nc4', thepage) 

# Subset the original dataset leaving only those lines
thepage <- thepage[nc4.lines]

#extract the file names
str.loc <- str_locate(thepage,'A.*nc4?"')

#substring
file.list <- substring(thepage,str.loc[,1], str.loc[,2]-1)

# download all files
for ( ifile in file.list){
 download.file(paste0("https://data.giss.nasa.gov/impacts/agmipcf/agmerra/",
                      ifile),
               destfile=ifile, method="libcurl")

来源：https://stackoverflow.com/questions/50164561/download-files-with-specific-extension-from-a-website

标签

netcdf

rcurl

易学教程内所有资源均来自网络或用户发布的内容，如有违反法律规定的内容欢迎反馈！
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!