How do I get the final, redirected, canonical URL of a website using PHP?

前端 未结 3 459
无人及你
无人及你 2021-02-07 15:49

In the days of link shorteners and Ajax, there can be many links that ultimately point to the same content. I was wondering what the best way is to get the final, best link for

3条回答
  •  梦谈多话
    2021-02-07 16:29

    Since I wasn't able to find any libraries that really did what I was looking for, and I was hoping to do more than just follow HTTP redirects, I have gone ahead and created a library that accomplishes the goals and released it under the MIT license. You can get it here:

    https://github.com/mattwright/URLResolver.php

    URLResolver.php is a PHP class that attempts to resolve URLs to a final, canonical link:

    • Follows 301 and 302 redirects found in HTTP headers
    • Follows Open Graph URL tags found in web page
    • Follows Canonical URL tags found in web page
    • Aborts download quickly if content type is not an HTML page

    I am certainly not an expert on the rules of HTTP redirection, so if anyone has suggestions on how to improve this library, it would be greatly appreciated. I have tested in on thousands of URLs and it seems to do pretty well. I followed Mario's advice and used PHP Simple HTML Parser library where needed.

提交回复
热议问题