rvest how to select a specific css node by id

后端 未结 3 1878
长情又很酷
长情又很酷 2021-02-07 11:36

I\'m trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this:

相关标签:
3条回答
  • 2021-02-07 12:00

    You can use xpath:

    require(rvest)
    text <- '<div class="style">
       <input id="a" value="123">
       <input id="b">
    </div>'
    
    h <- read_html(text)
    
    h %>% 
      html_nodes(xpath = '//*[@id="a"]') %>%
      xml_attr("value")
    

    The easiest way to get css- and xpath-selector is to use http://selectorgadget.com/. For a specific attribute like yours use chrome's developer toolbar to get the xpath as follows:

    0 讨论(0)
  • 2021-02-07 12:04

    This will work just fine with straight CSS selectors:

    library(rvest)
    
    doc <- '<div class="style">
       <input id="a" value="123">
       <input id="b">
    </div>'
    
    pg <- html(doc)
    html_attr(html_nodes(pg, "div > input:first-of-type"), "value")
    
    ## [1] "123"
    
    0 讨论(0)
  • 2021-02-07 12:21

    Adding an answer bc I don't see the easy css selector shorthand for selecting by id: using #your_id_name:

    h %>% 
      html_node('#a') %>%
      html_attr('value')
    

    which outputs "123" as desired.

    Same setup as the others:

    require(rvest)
    text <- '<div class="style">
       <input id="a" value="123">
       <input id="b">
    </div>'
    
    h <- read_html(text)
    
    0 讨论(0)
提交回复
热议问题