PHP cURL: Get target of redirect, without following it

前端 未结 5 1297
梦毁少年i
梦毁少年i 2021-01-02 10:47

The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn\'t include the bit of information I want at the m

相关标签:
5条回答
  • 2021-01-02 10:54

    I had the same problem and curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); was of any help.

    So, I decided not to use CURL but file_get_contents instead:

    $data = file_get_contents($url);
    $data = str_replace("<meta http-equiv=\"Refresh\" content=\"0;","<meta",$data);
    

    The last line helped me to block the redirection although the product is not a clean html code.

    I parsed the data and could retrieve the redirection URL I wanted to get.

    0 讨论(0)
  • 2021-01-02 11:07

    No there is no more efficient way
    Your can use CURLOPT_WRITEHEADER + VariableStream
    So.. you could write headers to variable and parse it

    0 讨论(0)
  • 2021-01-02 11:10

    You can simply use it: (CURLINFO_REDIRECT_URL)

    $info = curl_getinfo($ch, CURLINFO_REDIRECT_URL);
    echo $info; // the redirect URL without following it
    

    as you mentioned, disable the CURLOPT_FOLLOWLOCATION option (before executing) and place my code after executing.

    CURLINFO_REDIRECT_URL - With the CURLOPT_FOLLOWLOCATION option disabled: redirect URL found in the last transaction, that should be requested manually next. With the CURLOPT_FOLLOWLOCATION option enabled: this is empty. The redirect URL in this case is available in CURLINFO_EFFECTIVE_URL

    Refrence

    0 讨论(0)
  • 2021-01-02 11:11

    This can be done in 4 easy steps:

    Step 1. Initialise curl

    curl_init($ch); //initialise the curl handle
    //COOKIESESSION is optional, use if you want to keep cookies in memory
    curl_setopt($this->ch, CURLOPT_COOKIESESSION, true);
    

    Step 2. Get the headers for $url

    curl_setopt($ch, CURLOPT_URL, $url); //specify your URL
    curl_setopt($ch, CURLOPT_HEADER, true); //include headers in http data
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false); //don't follow redirects
    $http_data = curl_exec($ch); //hit the $url
    $curl_info = curl_getinfo($ch);
    $headers = substr($http_data, 0, $curl_info['header_size']); //split out header
    

    Step 3. Check if you have the correct response code

    if (!($curl_info['http_code']>299 && $curl_info['http_code']<309)) {
      //return, echo, die, whatever you like
      return 'Error - http code'.curl_info['http_code'].' received.';
    }
    

    Step 4. Parse the headers to get the new URL

    preg_match("!\r\n(?:Location|URI): *(.*?) *\r\n!", $headers, $matches);
    $url = $matches[1];
    

    Once you have the new URL you can then repeat steps 2-4 as often as you like.

    0 讨论(0)
  • 2021-01-02 11:12

    curl doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:

    From the response:

    Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).

    If the response has a format similar to:

    <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
    <html><head>
    <title>301 Moved Permanently</title>
    </head><body>
    <h1>Moved Permanently</h1>
    <p>The document has moved <a href="http://www.xxx.yyy/zzz">here</a>.</p>
    <hr>
    <address>Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80</address>
    </body></html>
    

    You can extract the redirect URL using DOMXPath:

    $i = 0;
    foreach($urls as $url) {
        if(substr($url,0,4) == "http") {
            $c = curl_init($url);
            curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
            $result = @curl_exec($c);
            $status = curl_getinfo($c,CURLINFO_HTTP_CODE);
            curl_close($c);
            $results[$i]['code'] = $status;
            $results[$i]['url'] = $url;
    
            if($status === 301) {
                $xml = new DOMDocument();
                $xml->loadHTML($result);
                $xpath = new DOMXPath($xml);
                $href = $xpath->query("//*[@href]")->item(0);
                $results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
            }
            $i++;
        }
    }
    

    Using CURLOPT_NOBODY

    There is a faster way however, as @gAMBOOKa points out; Using CURLOPT_NOBODY. This approach just sends a HEAD request instead of GET (not downloading the actual content, so it should be faster and more efficient) and stores the response header.

    Using a regex the target URL can be extracted from the header:

    foreach($urls as $url) {
        if(substr($url,0,4) == "http") {
            $c = curl_init($url);
            curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
            curl_setopt($c, CURLOPT_NOBODY,true);
            curl_setopt($c, CURLOPT_HEADER, true);
            $result = @curl_exec($c);
            $status = curl_getinfo($c,CURLINFO_HTTP_CODE);
            curl_close($c);
            $results[$i]['code'] = $status;
            $results[$i]['url'] = $url;
    
            if($status === 301 || $status === 302) {
                preg_match("@https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?@",$result,$m);
                $results[$i]['target'] = $m[0];
            }
            $i++;
        }
    }
    
    0 讨论(0)
提交回复
热议问题