Using R to scrape the link address of a downloadable file from a web page?

前端未结

关注

 1  1674

I\'m trying to automate a process that involves downloading .zip files from a couple of web pages and extracting the .csvs they contain. The challenge is that the .zip file name

相关标签:

1条回答

时光取名叫无心

2021-02-08 01:24

I think you're trying to do too much in a single xpath expression - I'd attack the problem in a sequence of smaller steps:

library(rvest)
library(stringr)
page <- html("http://www.acleddata.com/data/realtime-data-2015/")

page %>%
  html_nodes("a") %>%       # find all links
  html_attr("href") %>%     # get the url
  str_subset("\\.xlsx") %>% # find those that end in xlsx
  .[[1]]                    # look at the first one

0 讨论(0)