How to pass ssl_verifypeer in Rvest?

混江龙づ霸主 提交于 2019-12-23 10:12:29

问题


I'm trying to use Rvest to scrape a table off of an internal webpage here at $JOB. I've used the methods listed here to get the xpath, etc.

My code is pretty simple:

library(httr)
library(rvest)
un = "username"; pw = "password"

thexpath <- "//*[@id="theFormOnThePage"]/fieldset/table"    

url1 <- "https://biglonghairyURL.do?blah=yadda"

stuff1 <- read_html(url1, authenticate(un, pw))

This gets me an error of: "Peer certificate cannot be authenticated with given CA certificates."

Leaving aside the not-up-to-datedness of the certificates, I've see that it's possible using httr to avoid the ssl verification using set_config(config(ssl_verifypeer = 0L)).

This works just peachy if I use GET(url1) from httr, but the whole point is to automate scraping of the table using rvest.

Looking at the PDFs for Rvest, and httr, it seems that Rvest calls httr to pass on the curl commands, and that in httr, you can use config().

So, to complete the syllogism, how can I (or is it even possible?) to pass the ssl_verifypeer = 0L directly through rvest::read_html?

I've tried a fair number of variations:

stuff1 <- read_html(url1, authenticate(un, pw), ssl_verifypeer = 0L)) 
stuff1 <- read_html(url1, authenticate(un, pw), config(ssl_verifypeer = 0L))) 
stuff1 <- with_config(config = config(ssl_verifypeer = 0L), read_html(url1, authenticate(un, pw)))

And all of them throw the same error of "Peer certificate cannot be authenticated with given CA certificates."

Hopefully it's possible and I'm just not putting the correct syntax together?

Someone suggested using RSelenium, but since this is in a protected VM, getting java and/or new packages installed takes an act of Congress (along with a VP signoff) and would be my very last resort.

I very much appreciate any advice on this.

来源:https://stackoverflow.com/questions/34551299/how-to-pass-ssl-verifypeer-in-rvest

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!