How do I scrape html tables using the XML package?
Take, for example, this wikipedia page on the Brazilian soccer team. I would like to read it in R and get the \"li
The rvest
along with xml2
is another popular package for parsing html web pages.
library(rvest)
theurl <- "http://en.wikipedia.org/wiki/Brazil_national_football_team"
file<-read_html(theurl)
tables<-html_nodes(file, "table")
table1 <- html_table(tables[4], fill = TRUE)
The syntax is easier to use than the xml
package and for most web pages the package provides all of the options ones needs.