How do I get the final, redirected, canonical URL of a website using PHP?

♀尐吖头ヾ 提交于 2019-12-03 05:35:35

Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:

https://github.com/mattwright/URLResolver.php

URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:

  • Follows 301 and 302 redirects found in HTTP headers
  • Follows Open Graph URL <meta> tags found in web page <head>
  • Follows Canonical URL <link> tags found in web page <head>
  • Aborts download quickly if content type is not an HTML page

I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.

Using Guzzle (a well known and robust HTTP client) you can do it like that:

<?php
use Guzzle\Http\Client as GuzzleClient;
use Guzzle\Plugin\History\HistoryPlugin;

public function resolveUrl($url)
{
    $client   = new GuzzleClient($url);
    $history  = new HistoryPlugin();
    $client->addSubscriber($history);

    $response = $client->head($url)->send();

    if (!$response->isSuccessful()) {
        throw new \Exception(sprintf("Url %s is not a valid URL or website is down.", $url));
    }

    return $response->getEffectiveUrl();
}
Homer6

I wrote you a little function to do it. It's simple, but it may be a starting point for you. Note: the http://dlvr.it/xxb0W url returns an invalid URL for it's Location response header.

You'll need the Altumo PHP library for it to work. It's a library that I wrote, but it's MIT license, as is this function.

See: https://github.com/homer6/altumo

Also, you'll have to wrap the function in a try/catch.

/**
* Gets the final URL of a URL that will be redirected.
* 
* @param string $url_string
* @throws \Exception                    //on error
* @return string
*/
function get_final_url( $url_string ){

    while( 1 ){

        //validate URL
            $url = new \Altumo\String\Url( $url_string );

        //get the Location response header of the URL
            $client = new \Altumo\Http\OutgoingHttpRequest( $url_string );
            $response = $client->sendAndGetResponseMessage();
            $location = $response->getHeader( 'Location' );

        //return the URL if no Location header was found, else continue
            if( is_null($location) ){
                return $url_string;
            }else{
                $url_string = $location;
            }

    }

}

echo get_final_url( 'your url here' );

Please let me know if you'd like further modifications or help getting it going.

易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!