Why url.exists returns FALSE when the URL does exists using RCurl?

偶尔善良 提交于 2020-01-25 21:42:08

问题


For example:

if(url.exists("http://www.google.com")) {
    # Two ways to submit a query to google. Searching for RCurl
    getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search")
    # Here we let getForm do the hard work of combining the names and values.
    getForm("http://www.google.com/search", hl="en", lr="",ie="ISO-8859-1", q="RCurl", btnG="Search")
    # And here if we already have the parameters as a list/vector.
    getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search"))
}

This is an example from RCurl package manual. However, it does not work:

> url.exists("http://www.google.com")
[1] FALSE

I found there is an answer to this here Rcurl: url.exists returns false when url does exists. It said this is because of the default user agent is not useful. But I do not understand what user agent is and how to use it.

Also, this error happened when I worked in my company. I tried the same code at home, and it worked find. So I am guessing this is because of proxy. Or there is some other reasons that I did not realize.

I need to use RCurl to search my queries from Google, and then extract the information such as title and descriptions from the website. In this case, how to use user agent? Or, does the package httr can do this?


回答1:


guys. Thanks a lot for help. I think I just figured out how to do it. The important thing is proxy. If I use:

> opts <- list(
     proxy         = "http://*******",
     proxyusername = "*****", 
     proxypassword = "*****", 
     proxyport     = 8080
)
> url.exists("http://www.google.com",.opts = opts)
[1] TRUE

Then all done! You can find your proxy under System-->proxy if you use win 10. At the same time:

 > site <- getForm("http://www.google.com.au", hl="en",
                 lr="", q="r-project", btnG="Search",.opts = opts)
 > htmlTreeParse(site)
 $file
 [1] "<buffer>"
 .........

In getForm, opts needs to be put in as well. There are two posters here (RCurl default proxy settings and Proxy setting for R) answering the same question. I have not tried how to extract information from here.



来源:https://stackoverflow.com/questions/40391047/why-url-exists-returns-false-when-the-url-does-exists-using-rcurl

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!