问题
For example:
if(url.exists("http://www.google.com")) {
# Two ways to submit a query to google. Searching for RCurl
getURL("http://www.google.com/search?hl=en&lr=&ie=ISO-8859-1&q=RCurl&btnG=Search")
# Here we let getForm do the hard work of combining the names and values.
getForm("http://www.google.com/search", hl="en", lr="",ie="ISO-8859-1", q="RCurl", btnG="Search")
# And here if we already have the parameters as a list/vector.
getForm("http://www.google.com/search", .params = c(hl="en", lr="", ie="ISO-8859-1", q="RCurl", btnG="Search"))
}
This is an example from RCurl package manual. However, it does not work:
> url.exists("http://www.google.com")
[1] FALSE
I found there is an answer to this here Rcurl: url.exists returns false when url does exists. It said this is because of the default user agent is not useful. But I do not understand what user agent is and how to use it.
Also, this error happened when I worked in my company. I tried the same code at home, and it worked find. So I am guessing this is because of proxy. Or there is some other reasons that I did not realize.
I need to use RCurl to search my queries from Google, and then extract the information such as title and descriptions from the website. In this case, how to use user agent? Or, does the package httr can do this?
回答1:
guys. Thanks a lot for help. I think I just figured out how to do it. The important thing is proxy. If I use:
> opts <- list(
proxy = "http://*******",
proxyusername = "*****",
proxypassword = "*****",
proxyport = 8080
)
> url.exists("http://www.google.com",.opts = opts)
[1] TRUE
Then all done! You can find your proxy under System-->proxy if you use win 10. At the same time:
> site <- getForm("http://www.google.com.au", hl="en",
lr="", q="r-project", btnG="Search",.opts = opts)
> htmlTreeParse(site)
$file
[1] "<buffer>"
.........
In getForm, opts needs to be put in as well. There are two posters here (RCurl default proxy settings and Proxy setting for R) answering the same question. I have not tried how to extract information from here.
来源:https://stackoverflow.com/questions/40391047/why-url-exists-returns-false-when-the-url-does-exists-using-rcurl