问题
I'm relatively new to RSelenium. I have successfully managed to log into a site from where I need to pull all web links.
That overview page looks like this:
<a title="Search 'A2A'" href="/search?company=a2a&rf=13">A2A</a>
<a title="Search 'ABB'" href="/search?company=abb&rf=13">ABB</a>
<a title="Search 'Achmea'" href="/search?company=achmea&rf=13">Achmea</a>
etc... this continues for another ~6000 links
I have tried to use the following line to grab all the links, but this has not worked:
remDr$findElement(using="link text", value="href")
I'd be very grateful if someone could show me how to grab all the links, including the company names, such as 'A2A', 'ABB', 'Achmea', etc.
Regards, mr_bungles
回答1:
I suggest you use 'rvest' and 'tidyverse' along with RSelenium.
library(tidyverse)
library(rvest)
url <- 'add your url here'
pg <- read_html(url)
tbl <- tibble(
text = pg %>% html_nodes('add css selector here') %>% html_text()
link = pg %>% html_nodes('add css selector here') %>% html_attr('href')
)
来源:https://stackoverflow.com/questions/45531169/rselenium-scraping-links-on-page