问题
I'm using R in a commercial environment where external connectivity all goes via a web proxy, so we need to specify the proxy server address and ensure we connect to it with Windows authentication.
I already have code that will configure the RCurl and httr packages to use those settings by default - i.e.
httr::set_config(config(
proxy = "my.proxy.address",
proxyuserpwd = ":",
proxyauth = 4
))
or
opts <- list(
proxy = "my.proxy.address",
proxyuserpwd = ":",
proxyauth = 4
)
RCurl::options(RCurlOptions = opts)
However, in a couple of cases recently, I've found packages that depend on the curl package to make web requests - for instance xml2::read_xml
- and I can't find any way to set the same proxy options so they're picked up by default and used by curl.
If I use curl directly myself, I can set the options on a new handle and the following code is sufficient to work successfully:
h = new_handle(proxy = "my.proxy.address",
proxyuserpwd = ":")
con = curl(url,handle = h)
page = xml2::read_xml(con)
... but this isn't any help when the use of curl is buried within someone else's function!
Alternatively, I know I can set up an environment variable for the proxy address, like this:
Sys.setenv(https_proxy = "https://my.proxy.address")
... and libcurl picks it up. But if I do just this, then I end up with an HTTP 407 proxy authentication error. Is there a way to specify blank username / password (as the proxyuserpwd setting does), so we authenticate with Windows credentials? It also doesn't seem possible to specify the proxyauth option as an environment variable.
Can anyone offer a solution or any suggestions, please?
回答1:
I was having similar issues. Here are the steps that worked for me:
- Download my company's proxy auto-config file (PAC file). For IE: click the gear icon --> internet options --> Connections --> LAN Settings --> copy the http address into a new browser window to download the text file.
- Locate the line in the PAC file specifying the proxy (eg: "auth-proxy.xxxxxxx.com:9999")
In a new R session, test these proxy settings by temporarily setting them with a command similar to the following, substituting your values from your PAC file:
Sys.setenv(http_proxy = "auth-proxy.xxxxxxx.com:9999") Sys.setenv(https_proxy = "auth-proxy.xxxxxxx.com:9999")
Rerun your code in the same session to see if these new settings solve the issue. This is the test I used.
read_html(curl('http://google.com', handle = curl::new_handle("useragent" = "Mozilla/5.0")))
Setting the proxy using Sys.setenv
will only persist to the end of your current session. To make a more permanent change you may consider adding this to your R_PROFILE
as explained here.
来源:https://stackoverflow.com/questions/53011866/how-to-configure-the-curl-package-in-r-with-default-web-proxy-settings