web scraping data table with r rvest

后端 未结 2 789
夕颜
夕颜 2021-01-21 18:58

I\'m trying to scrape a table from the following website:

http://www.basketball-reference.com/leagues/NBA_2016.html?lid=header_seasons#all_misc_stats

The table

2条回答
  •  清酒与你
    2021-01-21 19:40

    Since the table you want is hidden in a comment until revealed by JavaScript, you either need to use RSelenium to run the JavaScript (which is kind of a pain), or parse the comments (which is still a pain, but slightly less so).

    library(rvest)
    library(readr)    # for type_convert
    
    adv <- "http://www.basketball-reference.com/leagues/NBA_2016.html?lid=header_seasons#all_misc_stats"
    
    h <- adv %>% read_html()    # be kind; don't rescrape unless necessary
    
    df <- h %>% html_nodes(xpath = '//comment()') %>%    # select comments
        html_text() %>%    # extract comment text
        paste(collapse = '') %>%    # collapse to single string
        read_html() %>%    # reread as HTML
        html_node('table#misc_stats') %>%    # select desired node
        html_table() %>%    # parse node to table
        { setNames(.[-1, ], paste0(names(.), .[1, ])) } %>%    # extract names from first row
        type_convert()    # fix column types
    
    df[1:6, 1:14]
    ##   Rk                   Team  Age PW PL   MOV   SOS   SRS  ORtg  DRtg Pace   FTr  3PAr   TS%
    ## 2  1 Golden State Warriors* 27.4 65 17 10.76 -0.38 10.38 114.5 103.8 99.3 0.250 0.362 0.593
    ## 3  2     San Antonio Spurs* 30.3 67 15 10.63 -0.36 10.28 110.3  99.0 93.8 0.246 0.223 0.564
    ## 4  3 Oklahoma City Thunder* 25.8 59 23  7.28 -0.19  7.09 113.1 105.6 96.7 0.292 0.275 0.565
    ## 5  4   Cleveland Cavaliers* 28.1 57 25  6.00 -0.55  5.45 110.9 104.5 93.3 0.259 0.352 0.558
    ## 6  5  Los Angeles Clippers* 29.7 53 29  4.28 -0.15  4.13 108.3 103.8 95.8 0.318 0.324 0.556
    ## 7  6       Toronto Raptors* 26.3 53 29  4.50 -0.42  4.08 110.0 105.2 92.9 0.328 0.287 0.552
    

提交回复
热议问题