How to use Tor socks5 in R getURL

99封情书 提交于 2019-12-20 15:45:00

问题


I want to use Tor in getURL function in R. Tor is working (checked in firefox), socks5 at port 9050. But when I set this in R, I get the following error

html <- getURL("http://www.google.com", followlocation = T, .encoding="UTF-8", .opts = list(proxy = "127.0.0.1:9050", timeout=15))

Error in curlPerform(curl = curl, .opts = opts, .encoding = .encoding) : '\n\nTor is not an HTTP Proxy\n\n\n

Tor is not an HTTP Proxy

\n

\nIt appears you have configured your web browser to use Tor as an HTTP proxy.\nThis is not correct: Tor is a SOCKS proxy, not an HTTP proxy.\nPlease configure your client accordingly.

I've tried replace proxy with socks, socks5 but it didn't work.


回答1:


There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.

The call from the shell (which you can translate to the R binding) is:

curl --socks5-hostname 127.0.0.1:9050 google.com

Tor will do the DNS also for A records.




回答2:


RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

Here is a pure R solution which will use Tor for dns-queries.

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)

If you want to specify additional parameters, see below on where to put them:

library(RCurl)
options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                            useragent = "Mozilla",
                            followlocation = TRUE,
                            referer = "",
                            cookiejar = "my.cookies.txt"
                            )
        )
my.handle <- getCurlHandle()
html <- getURL(url='https://www.torproject.org', curl=my.handle)



回答3:


Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)




回答4:


Under Mac OSX install Tor Bundle for Mac and Privoxy and then update the proxy settings in the system preferences.

'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118

'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Secure Web Proxy (HTTPS)' Secure Web Proxy Server 127.0.0.1:8118 --> 'OK' --> 'Apply'

library(rcurl)
curl <- getCurlHandle()
curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
html <- getURL(url='check.torproject.com',curl=curl)


来源:https://stackoverflow.com/questions/17925234/how-to-use-tor-socks5-in-r-geturl

标签
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!