问题
I'm trying to use Rvest to scrape a table off of an internal webpage here at $JOB. I've used the methods listed here to get the xpath, etc.
My code is pretty simple:
library(httr)
library(rvest)
un = "username"; pw = "password"
thexpath <- "//*[@id="theFormOnThePage"]/fieldset/table"
url1 <- "https://biglonghairyURL.do?blah=yadda"
stuff1 <- read_html(url1, authenticate(un, pw))
This gets me an error of: "Peer certificate cannot be authenticated with given CA certificates."
Leaving aside the not-up-to-datedness of the certificates, I've see that it's possible using httr to avoid the ssl verification using set_config(config(ssl_verifypeer = 0L))
.
This works just peachy if I use GET(url1) from httr, but the whole point is to automate scraping of the table using rvest.
Looking at the PDFs for Rvest, and httr, it seems that Rvest calls httr to pass on the curl commands, and that in httr, you can use config()
.
So, to complete the syllogism, how can I (or is it even possible?) to pass the ssl_verifypeer = 0L directly through rvest::read_html?
I've tried a fair number of variations:
stuff1 <- read_html(url1, authenticate(un, pw), ssl_verifypeer = 0L))
stuff1 <- read_html(url1, authenticate(un, pw), config(ssl_verifypeer = 0L)))
stuff1 <- with_config(config = config(ssl_verifypeer = 0L), read_html(url1, authenticate(un, pw)))
And all of them throw the same error of "Peer certificate cannot be authenticated with given CA certificates."
Hopefully it's possible and I'm just not putting the correct syntax together?
Someone suggested using RSelenium, but since this is in a protected VM, getting java and/or new packages installed takes an act of Congress (along with a VP signoff) and would be my very last resort.
I very much appreciate any advice on this.
来源:https://stackoverflow.com/questions/34551299/how-to-pass-ssl-verifypeer-in-rvest