I want to use Tor in getURL
function in R. Tor is working (checked in firefox), socks5
at port 9050
. But when I set this in R, I get the f
There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.
The call from the shell (which you can translate to the R binding) is:
curl --socks5-hostname 127.0.0.1:9050 google.com
Tor will do the DNS also for A records.
RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.
In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).
Here is a pure R solution which will use Tor for dns-queries.
library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
If you want to specify additional parameters, see below on where to put them:
library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
useragent = "Mozilla",
followlocation = TRUE,
referer = "",
cookiejar = "my.cookies.txt"
)
)
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)
Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)
Under Mac OSX install Tor Bundle for Mac and Privoxy and then update the proxy settings in the system preferences.
'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118
'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Secure Web Proxy (HTTPS)' Secure Web Proxy Server 127.0.0.1:8118 --> 'OK' --> 'Apply'
library(rcurl)
curl <- getCurlHandle()
curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
html <- getURL(url='check.torproject.com',curl=curl)