PHP cURL: Get target of redirect, without following it

前端 未结 5 1296
梦毁少年i
梦毁少年i 2021-01-02 10:47

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn\'t include the bit of information I want at the m

5条回答
  •  傲寒
    傲寒 (楼主)
    2021-01-02 11:12

    curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:

    From the response:

    Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).

    If the response has a format similar to:

    
    
    301 Moved Permanently
    
    

    Moved Permanently

    The document has moved here.


    Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80

    You can extract the redirect URL using DOMXPath:

    $i = 0;
    foreach($urls as $url) {
        if(substr($url,0,4) == "http") {
            $c = curl_init($url);
            curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
            $result = @curl_exec($c);
            $status = curl_getinfo($c,CURLINFO_HTTP_CODE);
            curl_close($c);
            $results[$i]['code'] = $status;
            $results[$i]['url'] = $url;
    
            if($status === 301) {
                $xml = new DOMDocument();
                $xml->loadHTML($result);
                $xpath = new DOMXPath($xml);
                $href = $xpath->query("//*[@href]")->item(0);
                $results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
            }
            $i++;
        }
    }
    

    Using CURLOPT_NOBODY

    There is a faster way however, as @gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.

    Using a regex the target URL can be extracted from the header:

    foreach($urls as $url) {
        if(substr($url,0,4) == "http") {
            $c = curl_init($url);
            curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($c, CURLOPT_NOBODY,true);
            curl_setopt($c, CURLOPT_HEADER, true);
            $result = @curl_exec($c);
            $status = curl_getinfo($c,CURLINFO_HTTP_CODE);
            curl_close($c);
            $results[$i]['code'] = $status;
            $results[$i]['url'] = $url;
    
            if($status === 301 || $status === 302) {
                preg_match("@https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?@",$result,$m);
                $results[$i]['target'] = $m[0];
            }
            $i++;
        }
    }
    

提交回复
热议问题