问题
I want to use Tor in getURL
function in R. Tor is working (checked in firefox), socks5
at port 9050
. But when I set this in R, I get the following error
html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15))
Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : '\n\nTor is not an HTTP Proxy\n\n\n
Tor is not an HTTP Proxy
\n\nIt appears you have configured your web browser to use Tor as an HTTP proxy.\nThis is not correct: Tor is a SOCKS proxy, not an HTTP proxy.\nPlease configure your client accordingly.
I've tried replace proxy with socks, socks5 but it didn't work.
回答1:
There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.
The call from the shell (which you can translate to the R binding) is:
curl --socks5-hostname 127.0.0.1:9050 google.com
Tor will do the DNS also for A records.
回答2:
RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.
In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).
Here is a pure R solution which will use Tor for dns-queries.
library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
If you want to specify additional parameters, see below on where to put them:
library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
useragent = "Mozilla",
followlocation = TRUE,
referer = "",
cookiejar = "my.cookies.txt"
)
)
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
回答3:
Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)
回答4:
Under Mac OSX install Tor Bundle for Mac and Privoxy and then update the proxy settings in the system preferences.
'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118
'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Secure Web Proxy (HTTPS)' Secure Web Proxy Server 127.0.0.1:8118 --> 'OK' --> 'Apply'
library(rcurl)
curl <- getCurlHandle()
curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
html <- getURL(url='check.torproject.com',curl=curl)
来源:https://stackoverflow.com/questions/17925234/how-to-use-tor-socks5-in-r-geturl