How to use Tor socks5 in R getURL

前端 未结 4 1222
终归单人心
终归单人心 2021-02-06 09:45

I want to use Tor in getURL function in R. Tor is working (checked in firefox), socks5 at port 9050. But when I set this in R, I get the f

相关标签:
4条回答
  • 2021-02-06 09:54

    There are curl bindings for R, after which you can use curl to call the Tor SOCKS5 proxy server.

    The call from the shell (which you can translate to the R binding) is:

    curl --socks5-hostname 127.0.0.1:9050 google.com

    Tor will do the DNS also for A records.

    0 讨论(0)
  • 2021-02-06 10:10

    RCurl will default to a HTTP proxy, but Tor provides a SOCKS proxy. Tor is clever enough to understand that the proxy client (RCurl) is trying to use a HTTP proxy, hence the error message in HTML returned by Tor.

    In order to get RCurl, and curl, to use a SOCKS proxy, you can use a protocol prefix, and there are two protocol prefixes for SOCKS5: "socks5" and "socks5h" (see the Curl manual). The latter will let the SOCKS server handle DNS-queries, which is the preferred method when using Tor (in fact, Tor will warn you if you let the proxy client resolve the hostname).

    Here is a pure R solution which will use Tor for dns-queries.

    library(RCurl)
    options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050"))
    my.handle <- getCurlHandle()
    html <- getURL(url='https://www.torproject.org', curl=my.handle)
    

    If you want to specify additional parameters, see below on where to put them:

    library(RCurl)
    options(RCurlOptions = list(proxy = "socks5h://127.0.0.1:9050",
                                useragent = "Mozilla",
                                followlocation = TRUE,
                                referer = "",
                                cookiejar = "my.cookies.txt"
                                )
            )
    my.handle <- getCurlHandle()
    html <- getURL(url='https://www.torproject.org', curl=my.handle)
    
    0 讨论(0)
  • 2021-02-06 10:14

    Hi Naparst I would really appreciate a hint on how to do the solution you propose option should be something like : opts <- list(socks5.hostname="127.0.0.1:9050") (this doesn't work since socks5.hostname is not an option)

    0 讨论(0)
  • 2021-02-06 10:15

    Under Mac OSX install Tor Bundle for Mac and Privoxy and then update the proxy settings in the system preferences.

    'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Web Proxy (HTTP)' Web Proxy Server 127.0.0.1:8118

    'System preferences' --> 'Wi-FI' --> 'Advanced' --> 'Proxies' --> set 'Secure Web Proxy (HTTPS)' Secure Web Proxy Server 127.0.0.1:8118 --> 'OK' --> 'Apply'

    library(rcurl)
    curl <- getCurlHandle()
    curlSetOpt(proxy='127.0.0.1:9150',proxytype=5,curl=curl)
    html <- getURL(url='check.torproject.com',curl=curl)
    
    0 讨论(0)
提交回复
热议问题