The curl_getinfo function returns a lot of metadata about the result of an HTTP request. However, for some reason it doesn\'t include the bit of information I want at the m
curl
doesn't seem to have a function or option to get the redirect target, it can be extracted using various techniques:
From the response:
Apache can respond with a HTML page in case of a 301 redirect (Doesn't seem to be the case with 302's).
If the response has a format similar to:
301 Moved Permanently
Moved Permanently
The document has moved here.
Apache/2.2.16 (Debian) Server at www.xxx.yyy Port 80
You can extract the redirect URL using DOMXPath
:
$i = 0;
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
$result = @curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301) {
$xml = new DOMDocument();
$xml->loadHTML($result);
$xpath = new DOMXPath($xml);
$href = $xpath->query("//*[@href]")->item(0);
$results[$i]['target'] = $href->attributes->getNamedItem('href')->nodeValue;
}
$i++;
}
}
Using CURLOPT_NOBODY
There is a faster way however, as @gAMBOOKa points out; Using CURLOPT_NOBODY
. This approach just sends a HEAD
request instead of GET
(not downloading the actual content, so it should be faster and more efficient) and stores the response header.
Using a regex the target URL can be extracted from the header:
foreach($urls as $url) {
if(substr($url,0,4) == "http") {
$c = curl_init($url);
curl_setopt($c, CURLOPT_RETURNTRANSFER, true);
curl_setopt($c, CURLOPT_NOBODY,true);
curl_setopt($c, CURLOPT_HEADER, true);
$result = @curl_exec($c);
$status = curl_getinfo($c,CURLINFO_HTTP_CODE);
curl_close($c);
$results[$i]['code'] = $status;
$results[$i]['url'] = $url;
if($status === 301 || $status === 302) {
preg_match("@https?://([-\w\.]+)+(:\d+)?(/([\w/_\-\.]*(\?\S+)?)?)?@",$result,$m);
$results[$i]['target'] = $m[0];
}
$i++;
}
}