I want to CURL to Google to see how many results it returns for a certain search.
I\'ve tried this:
$url = \"http://www.google.com/search?q=\".$str
Before scrapping data please read https://support.google.com/websearch/answer/86640?rd=1
Against google terms
Automated traffic includes:
Sending searches from a robot, computer program, automated service, or search scraper Using software that sends searches to Google to see how a website or webpage ranks on Google
Use a GET request instead of a POST request. That is, get rid of
curl_setopt($ch, CURLOPT_POST, true);
Or even better, use their well defined search API instead of screen-scraping.
Use the Google Ajax API.
http://code.google.com/apis/ajaxsearch/
See this thread for how to get the number of results. While it refers to c# libraries, it might give you some pointers.
CURLOPT_CUSTOMREQUEST => ($post)? "POST" : "GET"
Scrapping Google is a very easy thing to do. However, if you don't require more than the first 30 results, then the search API is preferable (as others have suggested). Otherwise, here's some sample code. I've ripped this out of a couple of classes that I'm using so it might not be totally functional as is, but you should get the idea.
function queryToUrl($query, $start=null, $perPage=100, $country="US") {
return "http://www.google.com/search?" . $this->_helpers->url->buildQuery(array(
// Query
"q" => urlencode($query),
// Country (geolocation presumably)
"gl" => $country,
// Start offset
"start" => $start,
// Number of result to a page
"num" => $perPage
), true);
}
// Find first 100 result for "pizza" in Canada
$ch = curl_init(queryToUrl("pizza", 0, 100, "CA"));
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($ch, CURLOPT_USERAGENT, $this->getUserAgent(/*$proxyIp*/));
curl_setopt($ch, CURLOPT_MAXREDIRS, 4);
curl_setopt($ch, CURLOPT_TIMEOUT, 5);
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
$response = curl_exec($ch);
Note: $this->_helpers->url->buildQuery()
is identical to http_build_query except that it will drop empty parameters.