问题
library(rvest)
df <- data.frame(Links = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"))
for(i in 1:3) {
webpage <- read_html(paste0("https://www.whatmobile.com.pk/", df$Links[i]))
data <- webpage %>%
html_nodes(".specs") %>%
.[[1]] %>%
html_table(fill = TRUE)
}
want to make loop works for all 3 values in df$Links
but above code just download the last one, and downloaded data must also be identical with variables (may be a new column with variables name)
回答1:
The problem is in how you're structuring your for
loop. It's much easier just to not use one in the first place, though, as R has great support for iterating over lists, like lapply
and purrr::map
. One version of how you could structure your data:
library(tidyverse)
library(rvest)
base_url <- "https://www.whatmobile.com.pk/"
models <- data_frame(model = c("Qmobile_Noir-M6", "Qmobile_Noir-A1", "Qmobile_Noir-E8"),
link = paste0(base_url, model),
page = map(link, read_html))
model_specs <- models %>%
mutate(node = map(page, html_node, '.specs'),
specs = map(node, html_table, header = TRUE, fill = TRUE),
specs = map(specs, set_names, c('var1', 'var2', 'val1', 'val2'))) %>%
select(model, specs) %>%
unnest()
model_specs
#> # A tibble: 119 x 5
#> model var1 var2
#> <chr> <chr> <chr>
#> 1 Qmobile_Noir-M6 Build OS
#> 2 Qmobile_Noir-M6 Build Dimensions
#> 3 Qmobile_Noir-M6 Build Weight
#> 4 Qmobile_Noir-M6 Build SIM
#> 5 Qmobile_Noir-M6 Build Colors
#> 6 Qmobile_Noir-M6 Frequency 2G Band
#> 7 Qmobile_Noir-M6 Frequency 3G Band
#> 8 Qmobile_Noir-M6 Frequency 4G Band
#> 9 Qmobile_Noir-M6 Processor CPU
#> 10 Qmobile_Noir-M6 Processor Chipset
#> # ... with 109 more rows, and 2 more variables: val1 <chr>, val2 <chr>
The data is still pretty messy, but at least it's all there.
回答2:
it is capturing all three values, but it writes over them with each loop. That's why it only shows one value, and that one value being for the last page
You need to initialise a variable first before you go into your loop, I suggest a list so you can store data for each successive loop. So something like
final_table <- list()
for(i in 1:3) {
webpage <- read_html(paste0("https://www.whatmobile.com.pk/", df$Links[i]))
data <- webpage %>%
html_nodes(".specs") %>%
.[[1]] %>%
html_table(fill= TRUE)
final_table[[i]] <- data.frame(data, stringsAsFactors = F)
}
In this was, it appends new data to the list with each loop.
来源:https://stackoverflow.com/questions/44910955/web-scraping-in-r-with-loop-from-data-frame