问题
I want to scrape a table like this (click on search then you will get a table of partners). I'd want to scrape the partner names. The problem is I don't know what kind of a table that is nor how to scrape it.
I am using RSelenium
package. If it can be done using rvest
then it would be much helpful.
So what kind of a table is this, is it possible to scrape it with RSelenium
or rvest
and if so, how?
ul="http://partnerlocator.symantec.com"
remDr$navigate(ul)
webElem<-remDr$findElement(using = "class", value = "button")
webElem$clickElement()
Sys.sleep(10)
webElem<-remDr$findElement(using = "class", value = "results")
unlist(webElem$getElementText())
But I get a very complex text output like this -
CDW\nCDW\n200 North Milwaukee Avenue\nVernon Hills ,Illinois ,60061\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCore Security - Platinum\nThreat Protection - Platinum\nCyber Security Services - Platinum\nInformation Protection - Platinum\nDLT Solutions\nDLT Solutions\n2411 Dulles Corner Park Suite 800\nHerndon ,Virginia ,20171\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nInformation Protection - Platinum\nThreat Protection - Platinum\nCore Security - Platinum\nCyber Security Services - Platinum\nInsight Direct USA\nInsight Direct USA\n3480 Lotus Drive\nPlano ,Texas ,75075\nUnited States\nDistance: 0 mi\nSymantec Platinum Partner\nCyber Security Services - Platinum\nCore Security - Platinum\nThreat Prot.........
回答1:
This looks like a pretty basic HTML table collapsed into one line which can be expanded as such:
library(RSelenium)
checkForServer()
ul="http://partnerlocator.symantec.com"
startServer()
remDr <- remoteDriver()
remDr$open()
remDr$navigate(ul)
webElem<-remDr$findElement(using = "class", value = "button")
webElem$clickElement()
Sys.sleep(10)
webElem<-remDr$findElement(using = "class", value = "results")
results <- webElem$getElementText()
results_chr <- unlist(strsplit(results[[1]], "\n"))
head(results_chr)
[1] "CDW" "CDW" "200 North Milwaukee Avenue"
[4] "Vernon Hills ,Illinois ,60061" "United States" "Distance: 0 mi"
You might be able to return a cleaner result from the HTML table for that results page with rvest
but I was unable to do so.
来源:https://stackoverflow.com/questions/38049819/scraping-table-with-r-using-rselenium