how to get the cookies from a php curl into a variable

后端 未结 8 745
囚心锁ツ
囚心锁ツ 2020-11-22 15:08

So some guy at some other company thought it would be awesome if instead of using soap or xml-rpc or rest or any other reasonable communication protocol he just embedded all

相关标签:
8条回答
  • 2020-11-22 15:40

    someone here may find it useful. hhb_curl_exec2 works pretty much like curl_exec, but arg3 is an array which will be populated with the returned http headers (numeric index), and arg4 is an array which will be populated with the returned cookies ($cookies["expires"]=>"Fri, 06-May-2016 05:58:51 GMT"), and arg5 will be populated with... info about the raw request made by curl.

    the downside is that it requires CURLOPT_RETURNTRANSFER to be on, else it error out, and that it will overwrite CURLOPT_STDERR and CURLOPT_VERBOSE, if you were already using them for something else.. (i might fix this later)

    example of how to use it:

    <?php
    header("content-type: text/plain;charset=utf8");
    $ch=curl_init();
    $headers=array();
    $cookies=array();
    $debuginfo="";
    $body="";
    curl_setopt($ch,CURLOPT_SSL_VERIFYPEER,false);
    curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
    $body=hhb_curl_exec2($ch,'https://www.youtube.com/',$headers,$cookies,$debuginfo);
    var_dump('$cookies:',$cookies,'$headers:',$headers,'$debuginfo:',$debuginfo,'$body:',$body);
    

    and the function itself..

    function hhb_curl_exec2($ch, $url, &$returnHeaders = array(), &$returnCookies = array(), &$verboseDebugInfo = "")
    {
        $returnHeaders    = array();
        $returnCookies    = array();
        $verboseDebugInfo = "";
        if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
            throw new InvalidArgumentException('$ch must be a curl handle!');
        }
        if (!is_string($url)) {
            throw new InvalidArgumentException('$url must be a string!');
        }
        $verbosefileh = tmpfile();
        $verbosefile  = stream_get_meta_data($verbosefileh);
        $verbosefile  = $verbosefile['uri'];
        curl_setopt($ch, CURLOPT_VERBOSE, 1);
        curl_setopt($ch, CURLOPT_STDERR, $verbosefileh);
        curl_setopt($ch, CURLOPT_HEADER, 1);
        $html             = hhb_curl_exec($ch, $url);
        $verboseDebugInfo = file_get_contents($verbosefile);
        curl_setopt($ch, CURLOPT_STDERR, NULL);
        fclose($verbosefileh);
        unset($verbosefile, $verbosefileh);
        $headers       = array();
        $crlf          = "\x0d\x0a";
        $thepos        = strpos($html, $crlf . $crlf, 0);
        $headersString = substr($html, 0, $thepos);
        $headerArr     = explode($crlf, $headersString);
        $returnHeaders = $headerArr;
        unset($headersString, $headerArr);
        $htmlBody = substr($html, $thepos + 4); //should work on utf8/ascii headers... utf32? not so sure..
        unset($html);
        //I REALLY HOPE THERE EXIST A BETTER WAY TO GET COOKIES.. good grief this looks ugly..
        //at least it's tested and seems to work perfectly...
        $grabCookieName = function($str)
        {
            $ret = "";
            $i   = 0;
            for ($i = 0; $i < strlen($str); ++$i) {
                if ($str[$i] === ' ') {
                    continue;
                }
                if ($str[$i] === '=') {
                    break;
                }
                $ret .= $str[$i];
            }
            return urldecode($ret);
        };
        foreach ($returnHeaders as $header) {
            //Set-Cookie: crlfcoookielol=crlf+is%0D%0A+and+newline+is+%0D%0A+and+semicolon+is%3B+and+not+sure+what+else
            /*Set-Cookie:ci_spill=a%3A4%3A%7Bs%3A10%3A%22session_id%22%3Bs%3A32%3A%22305d3d67b8016ca9661c3b032d4319df%22%3Bs%3A10%3A%22ip_address%22%3Bs%3A14%3A%2285.164.158.128%22%3Bs%3A10%3A%22user_agent%22%3Bs%3A109%3A%22Mozilla%2F5.0+%28Windows+NT+6.1%3B+WOW64%29+AppleWebKit%2F537.36+%28KHTML%2C+like+Gecko%29+Chrome%2F43.0.2357.132+Safari%2F537.36%22%3Bs%3A13%3A%22last_activity%22%3Bi%3A1436874639%3B%7Dcab1dd09f4eca466660e8a767856d013; expires=Tue, 14-Jul-2015 13:50:39 GMT; path=/
            Set-Cookie: sessionToken=abc123; Expires=Wed, 09 Jun 2021 10:18:14 GMT;
            //Cookie names cannot contain any of the following '=,; \t\r\n\013\014'
            //
            */
            if (stripos($header, "Set-Cookie:") !== 0) {
                continue;
                /**/
            }
            $header = trim(substr($header, strlen("Set-Cookie:")));
            while (strlen($header) > 0) {
                $cookiename                 = $grabCookieName($header);
                $returnCookies[$cookiename] = '';
                $header                     = substr($header, strlen($cookiename) + 1); //also remove the = 
                if (strlen($header) < 1) {
                    break;
                }
                ;
                $thepos = strpos($header, ';');
                if ($thepos === false) { //last cookie in this Set-Cookie.
                    $returnCookies[$cookiename] = urldecode($header);
                    break;
                }
                $returnCookies[$cookiename] = urldecode(substr($header, 0, $thepos));
                $header                     = trim(substr($header, $thepos + 1)); //also remove the ;
            }
        }
        unset($header, $cookiename, $thepos);
        return $htmlBody;
    }
    
    function hhb_curl_exec($ch, $url)
    {
        static $hhb_curl_domainCache = "";
        //$hhb_curl_domainCache=&$this->hhb_curl_domainCache;
        //$ch=&$this->curlh;
        if (!is_resource($ch) || get_resource_type($ch) !== 'curl') {
            throw new InvalidArgumentException('$ch must be a curl handle!');
        }
        if (!is_string($url)) {
            throw new InvalidArgumentException('$url must be a string!');
        }
    
        $tmpvar = "";
        if (parse_url($url, PHP_URL_HOST) === null) {
            if (substr($url, 0, 1) !== '/') {
                $url = $hhb_curl_domainCache . '/' . $url;
            } else {
                $url = $hhb_curl_domainCache . $url;
            }
        }
        ;
    
        curl_setopt($ch, CURLOPT_URL, $url);
        $html = curl_exec($ch);
        if (curl_errno($ch)) {
            throw new Exception('Curl error (curl_errno=' . curl_errno($ch) . ') on url ' . var_export($url, true) . ': ' . curl_error($ch));
            // echo 'Curl error: ' . curl_error($ch);
        }
        if ($html === '' && 203 != ($tmpvar = curl_getinfo($ch, CURLINFO_HTTP_CODE)) /*203 is "success, but no output"..*/ ) {
            throw new Exception('Curl returned nothing for ' . var_export($url, true) . ' but HTTP_RESPONSE_CODE was ' . var_export($tmpvar, true));
        }
        ;
        //remember that curl (usually) auto-follows the "Location: " http redirects..
        $hhb_curl_domainCache = parse_url(curl_getinfo($ch, CURLINFO_EFFECTIVE_URL), PHP_URL_HOST);
        return $html;
    }
    
    0 讨论(0)
  • 2020-11-22 15:42
    $ch = curl_init('http://www.google.com/');
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    // get headers too with this line
    curl_setopt($ch, CURLOPT_HEADER, 1);
    $result = curl_exec($ch);
    // get cookie
    // multi-cookie variant contributed by @Combuster in comments
    preg_match_all('/^Set-Cookie:\s*([^;]*)/mi', $result, $matches);
    $cookies = array();
    foreach($matches[1] as $item) {
        parse_str($item, $cookie);
        $cookies = array_merge($cookies, $cookie);
    }
    var_dump($cookies);
    
    0 讨论(0)
  • 2020-11-22 15:43

    My understanding is that cookies from curl must be written out to a file (curl -c cookie_file). If you're running curl through PHP's exec or system functions (or anything in that family), you should be able to save the cookies to a file, then open the file and read them in.

    0 讨论(0)
  • 2020-11-22 15:44

    The accepted answer seems like it will search through the entire response message. This could give you false matches for cookie headers if the word "Set-Cookie" is at the beginning of a line. While it should be fine in most cases. The safer way might be to read through the message from the beginning until the first empty line which indicates the end of the message headers. This is just an alternate solution that should look for the first blank line and then use preg_grep on those lines only to find "Set-Cookie".

        curl_setopt($ch, CURLOPT_HEADER, 1);
        //Return everything
        $res = curl_exec($ch);
        //Split into lines
        $lines = explode("\n", $res);
        $headers = array();
        $body = "";
        foreach($lines as $num => $line){
            $l = str_replace("\r", "", $line);
            //Empty line indicates the start of the message body and end of headers
            if(trim($l) == ""){
                $headers = array_slice($lines, 0, $num);
                $body = $lines[$num + 1];
                //Pull only cookies out of the headers
                $cookies = preg_grep('/^Set-Cookie:/', $headers);
                break;
            }
        }
    
    0 讨论(0)
  • 2020-11-22 15:49

    This does it without regexps, but requires the PECL HTTP extension.

    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_HEADER, 1);
    $result = curl_exec($ch);
    curl_close($ch);
    
    $headers = http_parse_headers($result);
    $cookobjs = Array();
    foreach($headers AS $k => $v){
        if (strtolower($k)=="set-cookie"){
            foreach($v AS $k2 => $v2){
                $cookobjs[] = http_parse_cookie($v2);
            }
        }
    }
    
    $cookies = Array();
    foreach($cookobjs AS $row){
        $cookies[] = $row->cookies;
    }
    
    $tmp = Array();
    // sort k=>v format
    foreach($cookies AS $v){
        foreach ($v  AS $k1 => $v1){
            $tmp[$k1]=$v1;
        }
    }
    
    $cookies = $tmp;
    print_r($cookies);
    
    0 讨论(0)
  • 2020-11-22 15:50

    libcurl also provides CURLOPT_COOKIELIST which extracts all known cookies. All you need is to make sure the PHP/CURL binding can use it.

    0 讨论(0)
提交回复
热议问题