scrape multiple linked HTML tables in R and rvest

后端 未结 2 759
花落未央
花落未央 2021-02-03 11:35

This article http://www.ajnr.org/content/30/7/1402.full contains four links to html-tables which I would like to scrape with rvest.

With help of the css selector:

<
相关标签:
2条回答
  • 2021-02-03 12:04

    Here's one approach:

    library(rvest)
    
    url <- "http://www.ajnr.org/content/30/7/1402.full"
    page <- read_html(url)
    
    # First find all the urls
    table_urls <- page %>% 
      html_nodes(".table-inline li:nth-child(1) a") %>%
      html_attr("href") %>%
      xml2::url_absolute(url)
    
    # Then loop over the urls, downloading & extracting the table
    lapply(table_urls, . %>% read_html() %>% html_table())
    
    0 讨论(0)
  • 2021-02-03 12:10

    You might want to use as follows:

    main_url <- "http://www.ajnr.org/content/30/7/1402/"
    urls <- paste(main_url,c("T1.expansion","T2.expansion","T3.expansion","T4.expansion"),".html", sep = "")
    tables <- list()
    for(i in seq_along(urls))
    {
      total <- readHTMLTable(urls[i])
      n.rows <- unlist(lapply(total, function(t) dim(t)[1]))
      tables[[i]] <- as.data.frame(total[[which.max(n.rows)]])
    }
    tables
    
    #[[1]]
    #  Glioma Grade Sensitivity Specificity    PPV    NPV
    #1    II vs III       50.0%       92.9%  80.0%  76.5%
    #2     II vs IV      100.0%      100.0% 100.0% 100.0%
    #3    III vs IV       78.9%       87.5%  93.8%  63.6%
    
    #[[2]]
    #  Glioma Grade Sensitivity Specificity   PPV    NPV
    #1    II vs III       87.5%       71.4% 63.6%  90.9%
    #2     II vs IV      100.0%       85.7% 90.5% 100.0%
    #3    III vs IV       89.5%       75.0% 89.5%  75.0%
    
    #[[3]]
    #  Criterion Sensitivity Specificity    PPV   NPV
    #1       ≥1*       85.2%       92.9%  95.8% 76.5%
    #2        ≥2       81.5%      100.0% 100.0% 73.7%
    
    #[[4]]
    #  Criterion Sensitivity Specificity   PPV   NPV
    #1     <1.92       96.3%       71.4% 86.7% 90.9%
    #2     <2.02       92.6%       71.4% 86.2% 83.3%
    #3    <2.12*       92.6%       85.7% 92.6% 85.7%
    
    0 讨论(0)
提交回复
热议问题