Web-Scraping in R programming (rvest)

后端未结

关注

 1  1483

I am trying to scrape all details (Type Of Traveller, Seat Type,Route,Date Flown, Seat Comfort, Cabin Staff Service, Food & Beverages, Inflight Entertainment,Gro

相关标签:

1条回答

面向向阳花

2021-01-16 02:12

It's a little involved because you need to tabulate the filled/unfilled stars to get the rating for each field. I would use html_table() to help, then re-insert the calculated star values:

require(tibble) require(purrr) require(rvest) my_url <- c("https://www.airlinequality.com/airline-reviews/emirates/") count_stars_in_cell <- function(cell) { html_children(cell) %>% html_attr("class") %>% equals("star fill") %>% which %>% length } get_ratings_each_review <- function(review) { review %>% html_nodes(".review-rating-stars") %>% lapply(count_stars_in_cell) %>% unlist } all_tables <- read_html(my_url) %>% html_nodes("table") reviews <- lapply(all_tables, html_table) ratings <- lapply(all_tables, get_ratings_each_review) for (i in seq_along(reviews)) { reviews[[i]]$X2[reviews[[i]]$X2 == "12345"] <- ratings[[i]] } print(reviews)

This gives you a list with one table for each review. These should be straightforward to combine into a single data frame.

0 讨论(0)

发布评论:

提交评论

加载中...

验证码

看不清?

提交回复