Extract Website Tables using rvest and html_nodes() and html_table()

前端 未结 1 709
难免孤独
难免孤独 2021-01-26 16:16

I\'m trying to extract data from the Basketball Reference website.

library(rvest)
data7 <- read_html(\"http://www.basketball-reference.com/teams/CLE/2017.html         


        
相关标签:
1条回答
  • 2021-01-26 17:14

    There is actually already an answer to this but it applies to an older version of the website.... The reason you cannot get the other tables is because they are dynamically created and when rendering the raw page in R the tables you want are in commented out strings. You should inspect-element of the page on chrome to see what I am referring to. The other answer is here How to scrape tables inside a comment tag in html with R?

    But for your year data:

    A <- read_html('http://www.basketball-reference.com/teams/CLE/2017.html') %>% # Read in the raw webpage
      xml_find_all('//comment()') %>% # Use xpath to find all comment nodes
      xml_text() %>% # convert to raw strings 
      paste0(collapse = "") %>% # flatten into a character vector
      read_html %>% # re-read as html content 
            xml_find_all("//table") %>% html_table
    
    cat(capture.output(lapply(A, head, 1)), sep = "\n")
    
    
    [[1]]
                       Date Type                                                                                       Note
    1 Kevin Love 2017-02-12 Knee Love is expected to miss six weeks after undergoing arthroscopic surgery on his left knee.
    
    [[2]]
                X1                X2
    1 Jim Boylan   Assistant Coach
    
    [[3]]
            G    MP   FG  FGA   FG%  3P  3PA  3P%   2P  2PA   2P%   FT  FTA   FT% ORB  DRB  TRB  AST STL BLK TOV   PF  PTS
    1 Team 58 14020 2305 4938 0.467 761 1952 0.39 1544 2986 0.517 1073 1420 0.756 564 1988 2552 1304 414 237 804 1033 6444
    
    [[4]]
       NA NA NA NA  NA  NA  NA   NA   NA   NA Advanced   NA Offense Four Factors   NA   NA     NA Defense Four Factors   NA   NA     NA               NA
    1   W  L PW PL MOV SOS SRS ORtg DRtg Pace      FTr 3PAr                 eFG% TOV% ORB% FT/FGA                 eFG% TOV% DRB% FT/FGA Arena Attendance
    
    [[5]]
      Rk              Age  G GS   MP  FG  FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF PTS/G
    1  1 LeBron James  32 54 54 37.5 9.6 17.7 0.541 1.7 4.4 0.387 7.9 13.3 0.592 0.589 4.8 6.9 0.691 1.1 6.7 7.9 8.9 1.4 0.6 4.3 1.7  25.7
    
    [[6]]
      Rk              Age  G GS   MP  FG FGA   FG% 3P 3PA   3P%  2P 2PA   2P%  eFG%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF  PTS
    1  1 LeBron James  32 54 54 2026 518 957 0.541 92 238 0.387 426 719 0.592 0.589 259 375 0.691  62 363 425 479  74  32 230 92 1387
    
    [[7]]
      Rk              Age  G GS   MP  FG FGA   FG%  3P 3PA   3P%  2P  2PA   2P%  FT FTA   FT% ORB DRB TRB AST STL BLK TOV  PF  PTS
    1  1 LeBron James  32 54 54 2026 9.2  17 0.541 1.6 4.2 0.387 7.6 12.8 0.592 4.6 6.7 0.691 1.1 6.5 7.6 8.5 1.3 0.6 4.1 1.6 24.6
    
    [[8]]
      Rk              Age  G GS   MP   FG  FGA   FG%  3P 3PA   3P%   2P  2PA   2P%  FT FTA   FT% ORB DRB  TRB  AST STL BLK TOV  PF PTS    ORtg DRtg
    1  1 LeBron James  32 54 54 2026 12.7 23.4 0.541 2.3 5.8 0.387 10.4 17.6 0.592 6.3 9.2 0.691 1.5 8.9 10.4 11.7 1.8 0.8 5.6 2.3  34 NA  118  107
    
    [[9]]
      Rk              Age  G   MP  PER   TS%  3PAr   FTr ORB% DRB% TRB% AST% STL% BLK% TOV% USG% Â  OWS DWS  WS WS/48 Â  OBPM DBPM BPM VORP
    1  1 LeBron James  32 54 2026 26.3 0.618 0.249 0.392  3.5 19.1 11.6 41.7  1.8  1.3   17 29.4 NA 6.9 2.4 9.3  0.22 NA  6.3  1.8   8  5.1
    
    [[10]]
         NA   NA   NA   NA   NA   NA                   NA   NA   NA   NA NA   NA              NA   NA   NA   NA NA   NA 2-Pt Field Goals    NA   NA 3-Pt Field Goals     NA
    1  <NA> <NA> <NA> <NA> <NA> <NA> % of FGA by Distance <NA> <NA> <NA> NA <NA> FG% by Distance <NA> <NA> <NA> NA <NA>                  Dunks <NA>                  Corner
        NA     NA   NA
    1 <NA> Heaves <NA>
    
    [[11]]
      Rk                   Salary
    1  1 LeBron James $30,963,450
    
    [[12]]
                               Yr  Tm Rd Pk             Team     G  MP FG FGA   FG% 3P 3PA 3P% FT FTA   FT% ORB DRB TRB AST STL BLK TOV PF PTS
    1 Vladimir Veremeenko NA 2006 WAS  2 48 NA Reggio Emilia it 18 139 17  29 0.586  0   0  NA  4   9 0.444  14  10  24   8   2   3   9 33  38
    
    0 讨论(0)
提交回复
热议问题