web scraping data table with r rvest

后端 未结 2 792
夕颜
夕颜 2021-01-21 18:58

I\'m trying to scrape a table from the following website:

http://www.basketball-reference.com/leagues/NBA_2016.html?lid=header_seasons#all_misc_stats

The table

2条回答
  •  爱一瞬间的悲伤
    2021-01-21 19:39

    Here is and another messy solution. Read the page, save it, reread it, remove the comment markers and then process the page:

    gameUrl <- "http://www.basketball-reference.com/leagues/NBA_2016.html?lid=header_seasons#all_misc_stats"
    gameHtml <- gameUrl %>% read_html()
    #gameHtml %>% html_nodes("tbody")
    
    #Only save and work with the body
    body<-html_node(gameHtml,"body")
    write_xml(body, "nba.xml")
    
    #Find and remove comments
    lines<-readLines("nba.xml")
    lines<-lines[-grep("", lines)]
    writeLines(lines, "nba2.xml")
    
    #Read the file back in and process normally
    body<-read_html("nba2.xml")
    
    #Table 10 was found by looking at all of tables and picking the one of interest
    tableofinterest<-(html_nodes(body, "tbody")[10])
    
    rows<-html_nodes(tableofinterest, "tr")
    tableOfResults<-t(sapply(rows, function(x) {html_text(html_nodes(x, "td"))}))
    #find titles from the frist record's attributes
    titles<-html_attrs(html_nodes(rows[1], "td"))
    dfnames<-unlist(titles)[seq(2, 2*length(titles), by=2)]
    
    #Final results are stored in data frame "df"
    df<-as.data.frame(tableOfResults)
    names(df)<-dfnames
    

    This code works but should be simplified! This was based on a similar solution which I posted here: How to get table using rvest()

提交回复
热议问题