How to scrape a table with rvest and xpath?

前端 未结 1 1219
Happy的楠姐
Happy的楠姐 2021-01-04 12:00

using the following documentation i have been trying to scrape a series of tables from marketwatch.com

here is the one represented by the code bellow:

1条回答
  •  -上瘾入骨i
    2021-01-04 12:32

    That website doesn't use an html table, so html_table() can't find anything. It actaully uses div classes column and data lastcolumn.

    So you can do something like

    url <- "http://www.marketwatch.com/investing/stock/IRS/profile"
    valuation_col <- url %>%
        read_html() %>%
        html_nodes(xpath='//*[@class="column"]')
    
    valuation_data <- url %>%
        read_html() %>%
        html_nodes(xpath='//*[@class="data lastcolumn"]')
    

    Or even

    url %>%
      read_html() %>%
      html_nodes(xpath='//*[@class="section"]')
    

    To get you most of the way there.

    Please also read their terms of use - particularly 3.4.

    0 讨论(0)
提交回复
热议问题