RSelenium: Scraping links on page

一世执手 提交于 2019-12-13 07:39:14

问题


I'm relatively new to RSelenium. I have successfully managed to log into a site from where I need to pull all web links.

That overview page looks like this:

<a title="Search 'A2A'" href="/search?company=a2a&amp;rf=13">A2A</a>
<a title="Search 'ABB'" href="/search?company=abb&amp;rf=13">ABB</a>
<a title="Search 'Achmea'" href="/search?company=achmea&amp;rf=13">Achmea</a>

etc... this continues for another ~6000 links

I have tried to use the following line to grab all the links, but this has not worked:

remDr$findElement(using="link text", value="href")

I'd be very grateful if someone could show me how to grab all the links, including the company names, such as 'A2A', 'ABB', 'Achmea', etc.

Regards, mr_bungles


回答1:


I suggest you use 'rvest' and 'tidyverse' along with RSelenium.

library(tidyverse)
library(rvest)

url <- 'add your url here'

pg <- read_html(url)

tbl <- tibble(
    text = pg %>% html_nodes('add css selector here') %>% html_text()
    link = pg %>% html_nodes('add css selector here') %>% html_attr('href')
)


来源:https://stackoverflow.com/questions/45531169/rselenium-scraping-links-on-page

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!