Web-Scraping in R programming (rvest)

后端 未结 1 1482
耶瑟儿~
耶瑟儿~ 2021-01-16 01:37

I am trying to scrape all details (Type Of Traveller, Seat Type,Route,Date Flown, Seat Comfort, Cabin Staff Service, Food & Beverages, Inflight Entertainment,Gro

相关标签:
1条回答
  • 2021-01-16 02:12

    It's a little involved because you need to tabulate the filled/unfilled stars to get the rating for each field. I would use html_table() to help, then re-insert the calculated star values:

    require(tibble)
    require(purrr)
    require(rvest)
    
    my_url <- c("https://www.airlinequality.com/airline-reviews/emirates/")
    
    count_stars_in_cell <- function(cell)
    {
      html_children(cell) %>% 
      html_attr("class")  %>%
      equals("star fill") %>% 
      which               %>% 
      length
    }
    
    get_ratings_each_review <- function(review) 
    {
      review                             %>%
      html_nodes(".review-rating-stars") %>%
      lapply(count_stars_in_cell)        %>%
      unlist
    }
    
    all_tables <- read_html(my_url)      %>%
                  html_nodes("table")
    
    reviews <- lapply(all_tables, html_table)
    
    ratings <- lapply(all_tables, get_ratings_each_review)
    
    for (i in seq_along(reviews))
    {
      reviews[[i]]$X2[reviews[[i]]$X2 == "12345"] <- ratings[[i]]
    }
    
    print(reviews)
    

    This gives you a list with one table for each review. These should be straightforward to combine into a single data frame.

    0 讨论(0)
提交回复
热议问题