I\'m trying to use the rvest package to scrape data from a web page. In a simple format, the html code looks like this:
You can use xpath:
require(rvest)
text <- '<div class="style">
<input id="a" value="123">
<input id="b">
</div>'
h <- read_html(text)
h %>%
html_nodes(xpath = '//*[@id="a"]') %>%
xml_attr("value")
The easiest way to get css- and xpath-selector is to use http://selectorgadget.com/. For a specific attribute like yours use chrome's developer toolbar to get the xpath as follows:
This will work just fine with straight CSS selectors:
library(rvest)
doc <- '<div class="style">
<input id="a" value="123">
<input id="b">
</div>'
pg <- html(doc)
html_attr(html_nodes(pg, "div > input:first-of-type"), "value")
## [1] "123"
Adding an answer bc I don't see the easy css selector shorthand for selecting by id: using #your_id_name
:
h %>%
html_node('#a') %>%
html_attr('value')
which outputs "123" as desired.
Same setup as the others:
require(rvest)
text <- '<div class="style">
<input id="a" value="123">
<input id="b">
</div>'
h <- read_html(text)