问题
I'm trying to scrape the following web page: https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020 I am interested in the table at the end, below "Stockholm Weather History for..."
With the submitted code I am able to get the information for the 1st in the month, but I don't understand how to get it for the rest of the days. If I change the date in the drop-down list the url address doesn't change. How can I scrape that table for all days of the month?
library(tidyverse)
library(rvest)
library(RSelenium)
library(stringr)
library(dplyr)
rD <- rsDriver(browser="chrome", port=4234L, chromever ="85.0.4183.83")
remDr <- rD[["client"]]
remDr$navigate("https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020")
webElems <- remDr$findElements(using="class name", value="sticky-wr")
s<-webElems[[1]]$getElementText()
s<-as.character(s)
print(s)
回答1:
It looks like you can extract the table with rvest
itself and don't need Rselenium
here. Although, the table might require some cleaning.
library(rvest)
url <- 'https://www.timeanddate.com/weather/sweden/stockholm/historic?month=3&year=2020'
url %>%
read_html() %>%
html_table() %>%
.[[3]] %>%
setNames(.[1, ]) -> tmp
tmp[-c(1, nrow(tmp)), ]
# Time Temp Weather Wind Humidity Barometer Visibility
#2 0:20.Aha 01 Mac 2 °C Light rain. Mostly cloudy. 20 km/h ↑ 93% 988 mbar 5 km
#3 0:50. 2 °C Drizzle. Low clouds. 13 km/h ↑ 93% 988 mbar N/A
#4 1:20. 2 °C Drizzle. Low clouds. 15 km/h ↑ 100% 987 mbar 9 km
#5 1:50. 2 °C Drizzle. Low clouds. 15 km/h ↑ 100% 987 mbar 8 km
#6 2:20. 2 °C Light rain. Low clouds. 19 km/h ↑ 100% 986 mbar 6 km
#7 2:50. 2 °C Light rain. Low clouds. 19 km/h ↑ 100% 985 mbar 4 km
#...
来源:https://stackoverflow.com/questions/64471950/how-do-i-scrape-information-in-this-table-using-r