问题
I need to fetch a certain part of data from multiple Wikipedia pages. How can I do that using WikipediR package? Or is there some other better option for the same. To be precise, I need only the below marked part from all the pages.
How can I get that? Any help would be appreciated.
回答1:
Can you be a little more specific as to what you want? Here's a simple way to import data from the web, and specifically from Wikipedia.
library(rvest)
scotusURL <- "https://en.wikipedia.org/wiki/List_of_Justices_of_the_Supreme_Court_of_the_United_States"
## ********************
## Option 1: Grab the tables from the page and use the html_table function to extract the tables you're interested in.
temp <- scotusURL %>%
html %>%
html_nodes("table")
html_table(temp[1]) ## Just the "legend" table
html_table(temp[2]) ## THE MAIN TABLE
Now, if you want to import data from multiple pages that have essentially the same structure, but maybe just change by some number or something, please try this method.
library(RCurl);library(XML)
pageNum <- seq(1:10)
url <- paste0("http://www.totaljobs.com/JobSearch/Results.aspx?Keywords=Leadership<xt=&Radius=10&RateType=0&JobType1=CompanyType=&PageNum=")
urls <- paste0(url, pageNum)
allPages <- lapply(urls, function(x) getURLContent(x)[[1]])
xmlDocs <- lapply(allPages, function(x) XML::htmlParse(x))
来源:https://stackoverflow.com/questions/31330540/how-to-get-data-from-wikipedia-page-using-wikipedir-package-in-r