I am using rvest
. And I would like to convert the result to a data frame:
> links <- pgsession %>% jump_to(urls[2]) %>% read_html() %&
This will get you all the attributes from the links into a tbl_df
. bind_rows
gets you "fill" for free:
library(rvest)
library(dplyr)
pg <- read_html("https://en.wikipedia.org/wiki/Main_Page")
links <- html_nodes(pg, "a")
bind_rows(lapply(xml_attrs(links), function(x) data.frame(as.list(x), stringsAsFactors=FALSE)))
## Source: local data frame [310 x 10]
##
## id href title class dir accesskey rel lang hreflang style
## (chr) (chr) (chr) (chr) (chr) (chr) (chr) (chr) (chr) (chr)
## 1 top NA NA NA NA NA NA NA NA NA
## 2 NA #mw-head NA NA NA NA NA NA NA NA
## 3 NA #p-search NA NA NA NA NA NA NA NA
## 4 NA /wiki/Wikipedia Wikipedia NA NA NA NA NA NA NA
## 5 NA /wiki/Free_content Free content NA NA NA NA NA NA NA
## 6 NA /wiki/Encyclopedia Encyclopedia NA NA NA NA NA NA NA
## 7 NA /wiki/Wikipedia:Introduction Wikipedia:Introduction NA NA NA NA NA NA NA
## 8 NA /wiki/Special:Statistics Special:Statistics NA NA NA NA NA NA NA
## 9 NA /wiki/English_language English language NA NA NA NA NA NA NA
## 10 NA /wiki/Portal:Arts Portal:Arts NA NA NA NA NA NA NA
## .. ... ... ... ... ... ... ... ... ... ...
Alternately, you could use purrr
:
library(rvest)
library(purrr)
pg <- read_html("https://en.wikipedia.org/wiki/Main_Page")
html_nodes(pg, "a") %>%
map(xml_attrs) %>%
map_df(~as.list(.))
## # A tibble: 342 × 10
## id href title class dir accesskey rel hreflang lang style
##
## 1 top
## 2 #mw-head
## 3 #p-search
## 4 /wiki/Wikipedia Wikipedia
## 5 /wiki/Free_content Free content
## 6 /wiki/Encyclopedia Encyclopedia
## 7 /wiki/Wikipedia:Introduction Wikipedia:Introduction
## 8 /wiki/Special:Statistics Special:Statistics
## 9 /wiki/English_language English language
## 10 /wiki/Portal:Arts Portal:Arts
## # ... with 332 more rows
which I think is more functionally idiomatic and an overall cleaner approach.