问题
I am trying to scrape some data from famous people on LinkedIn and I have a few problems. I would like do the following:
- On Hadley Wickhams page ( https://www.linkedin.com/in/hadleywickham/ ) I would like to use
RSelenium
to login and "click" the "Show 1 more education" - and also "Show 1 more experience" (note Hadley does not have the option to "Show 1 more experience" but does have the option to "Show 1 more education"). (by clicking the "Show more experience/education" allows me to scrape the full education and experience from the page). Alternatively Ted Cruz has an option to "Show 5 more experiences" which I would like to expand and scrape.
Code:
library(RSelenium)
library(rvest)
library(stringr)
library(xml2)
userID = "myEmailLogin" # The linkedIn email to login
passID = "myPassword" # and LinkedIn password
try(rsDriver(port = 4444L, browser = 'firefox'))
remDr <- remoteDriver()
remDr$open()
remDr$navigate("https://www.linkedin.com/login")
user <- remDr$findElement(using = 'id',"username")
user$sendKeysToElement(list(userID,key="tab"))
pass <- remDr$findElement(using = 'id',"password")
pass$sendKeysToElement(list(passID,key="enter"))
Sys.sleep(5) # give the page time to fully load
# Navgate to individual profiles
# remDr$navigate("https://www.linkedin.com/in/thejlo/") # Jennifer Lopez
# remDr$navigate("https://www.linkedin.com/in/cruzted/") # Ted Cruz
remDr$navigate("https://www.linkedin.com/in/hadleywickham/") # Hadley Wickham
Sys.sleep(5) # give the page time to fully load
html <- remDr$getPageSource()[[1]]
signals <- read_html(html)
personFullNameLocationXPath <- '/html/body/div[9]/div[3]/div/div/div/div/div[2]/main/div[1]/section/div[2]/div[2]/div[1]/ul[1]/li[1]'
personName <- signals %>%
html_nodes(xpath = personFullNameLocationXPath) %>%
html_text()
personTagLineXPath = '/html/body/div[9]/div[3]/div/div/div/div/div[2]/main/div[1]/section/div[2]/div[2]/div[1]/h2'
personTagLine <- signals %>%
html_nodes(xpath = personTagLineXPath) %>%
html_text()
personLocationXPath <- '//*[@id="ember49"]/div[2]/div[2]/div[1]/ul[2]/li[1]'
personLocation <- signals %>%
html_nodes(xpath = personLocationXPath) %>%
html_text()
personLocation %>%
gsub("[\r\n]", "", .) %>%
str_trim(.)
# Here is where I have problems
personExperienceTotalXPath = '//*[@id="experience-section"]/ul'
personExperienceTotal <- signals %>%
html_nodes(xpath = personExperienceTotalXPath) %>%
html_text()
The very end personExperienceTotal
is where I go wrong... I cannot seem to scrape the experience-section
. When I put my own LinkedIn URL (or some random person) it seems to work...
My question is, how can I click the expand experience/education
and scrape these sections?
来源:https://stackoverflow.com/questions/63784161/scraping-data-from-linkedin-using-rselenium-and-rvest