Using CURL with Google

前端 未结 5 1360
温柔的废话
温柔的废话 2021-01-07 07:04

I want to CURL to Google to see how many results it returns for a certain search.

I\'ve tried this:

  $url = \"http://www.google.com/search?q=\".$str         


        
相关标签:
5条回答
  • 2021-01-07 07:32

    Before scrapping data please read https://support.google.com/websearch/answer/86640?rd=1

    Against google terms

    Automated traffic includes:

    Sending searches from a robot, computer program, automated service, or search scraper Using software that sends searches to Google to see how a website or webpage ranks on Google

    0 讨论(0)
  • 2021-01-07 07:34

    Use a GET request instead of a POST request. That is, get rid of

    curl_setopt($ch, CURLOPT_POST, true);
    

    Or even better, use their well defined search API instead of screen-scraping.

    0 讨论(0)
  • 2021-01-07 07:45

    Use the Google Ajax API.

    http://code.google.com/apis/ajaxsearch/

    See this thread for how to get the number of results. While it refers to c# libraries, it might give you some pointers.

    0 讨论(0)
  • 2021-01-07 07:45

    CURLOPT_CUSTOMREQUEST => ($post)? "POST" : "GET"

    0 讨论(0)
  • 2021-01-07 07:52

    Scrapping Google is a very easy thing to do. However, if you don't require more than the first 30 results, then the search API is preferable (as others have suggested). Otherwise, here's some sample code. I've ripped this out of a couple of classes that I'm using so it might not be totally functional as is, but you should get the idea.

    function queryToUrl($query, $start=null, $perPage=100, $country="US") {
        return "http://www.google.com/search?" . $this->_helpers->url->buildQuery(array(
            // Query
            "q"     => urlencode($query),
            // Country (geolocation presumably)
            "gl"    => $country,
            // Start offset
            "start" => $start,
            // Number of result to a page
            "num"   => $perPage
        ), true);
    }
    
    // Find first 100 result for "pizza" in Canada
    $ch = curl_init(queryToUrl("pizza", 0, 100, "CA"));
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_USERAGENT,      $this->getUserAgent(/*$proxyIp*/));
    curl_setopt($ch, CURLOPT_MAXREDIRS,      4);
    curl_setopt($ch, CURLOPT_TIMEOUT,        5);
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 5);
    curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
    
    $response = curl_exec($ch);
    

    Note: $this->_helpers->url->buildQuery() is identical to http_build_query except that it will drop empty parameters.

    0 讨论(0)
提交回复
热议问题