How can I check if a URL exists via PHP?

前端 未结 22 1273
天涯浪人
天涯浪人 2020-11-22 04:13

How do I check if a URL exists (not 404) in PHP?

22条回答
  •  误落风尘
    2020-11-22 05:01

    karim79's get_headers() solution didn't worked for me as I gotten crazy results with Pinterest.

    get_headers(): SSL operation failed with code 1. OpenSSL Error messages: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed
    
    Array
    (
        [url] => https://www.pinterest.com/jonathan_parl/
        [exists] => 
    )
    
    get_headers(): Failed to enable crypto
    
    Array
    (
        [url] => https://www.pinterest.com/jonathan_parl/
        [exists] => 
    )
    
    get_headers(https://www.pinterest.com/jonathan_parl/): failed to open stream: operation failed
    
    Array
    (
        [url] => https://www.pinterest.com/jonathan_parl/
        [exists] => 
    ) 
    

    Anyway, this developer demonstrates that cURL is way faster than get_headers():

    http://php.net/manual/fr/function.get-headers.php#104723

    Since many people asked for karim79 to fix is cURL solution, here's the solution I built today.

    /**
    * Send an HTTP request to a the $url and check the header posted back.
    *
    * @param $url String url to which we must send the request.
    * @param $failCodeList Int array list of code for which the page is considered invalid.
    *
    * @return Boolean
    */
    public static function isUrlExists($url, array $failCodeList = array(404)){
    
        $exists = false;
    
        if(!StringManager::stringStartWith($url, "http") and !StringManager::stringStartWith($url, "ftp")){
    
            $url = "https://" . $url;
        }
    
        if (preg_match(RegularExpression::URL, $url)){
    
            $handle = curl_init($url);
    
    
            curl_setopt($handle, CURLOPT_RETURNTRANSFER, true);
    
            curl_setopt($handle, CURLOPT_SSL_VERIFYPEER, false);
    
            curl_setopt($handle, CURLOPT_HEADER, true);
    
            curl_setopt($handle, CURLOPT_NOBODY, true);
    
            curl_setopt($handle, CURLOPT_USERAGENT, true);
    
    
            $headers = curl_exec($handle);
    
            curl_close($handle);
    
    
            if (empty($failCodeList) or !is_array($failCodeList)){
    
                $failCodeList = array(404); 
            }
    
            if (!empty($headers)){
    
                $exists = true;
    
                $headers = explode(PHP_EOL, $headers);
    
                foreach($failCodeList as $code){
    
                    if (is_numeric($code) and strpos($headers[0], strval($code)) !== false){
    
                        $exists = false;
    
                        break;  
                    }
                }
            }
        }
    
        return $exists;
    }
    

    Let me explains the curl options:

    CURLOPT_RETURNTRANSFER: return a string instead of displaying the calling page on the screen.

    CURLOPT_SSL_VERIFYPEER: cUrl won't checkout the certificate

    CURLOPT_HEADER: include the header in the string

    CURLOPT_NOBODY: don't include the body in the string

    CURLOPT_USERAGENT: some site needs that to function properly (by example : https://plus.google.com)


    Additional note: In this function I'm using Diego Perini's regex for validating the URL before sending the request:

    const URL = "%^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@|\d{1,3}(?:\.\d{1,3}){3}|(?:(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)(?:\.(?:[a-z\d\x{00a1}-\x{ffff}]+-?)*[a-z\d\x{00a1}-\x{ffff}]+)*(?:\.[a-z\x{00a1}-\x{ffff}]{2,6}))(?::\d+)?(?:[^\s]*)?$%iu"; //@copyright Diego Perini
    

    Additional note 2: I explode the header string and user headers[0] to be sure to only validate only the return code and message (example: 200, 404, 405, etc.)

    Additional note 3: Sometime validating only the code 404 is not enough (see the unit test), so there's an optional $failCodeList parameter to supply all the code list to reject.

    And, of course, here's the unit test (including all the popular social network) to legitimates my coding:

    public function testIsUrlExists(){
    
    //invalid
    $this->assertFalse(ToolManager::isUrlExists("woot"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque4545646456"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque890800"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://instagram.com/mariloubiz1232132/", array(404, 405)));
    
    $this->assertFalse(ToolManager::isUrlExists("https://www.pinterest.com/jonathan_parl1231/"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://regex101.com/546465465456"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://twitter.com/arcadefire4566546"));
    
    $this->assertFalse(ToolManager::isUrlExists("https://vimeo.com/**($%?%$", array(400, 405)));
    
    $this->assertFalse(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666456456456"));
    
    
    //valid
    $this->assertTrue(ToolManager::isUrlExists("www.google.ca"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://plus.google.com/+JonathanParentL%C3%A9vesque"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://instagram.com/mariloubiz/"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://www.facebook.com/jonathan.parentlevesque"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://www.pinterest.com/"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://regex101.com"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://twitter.com/arcadefire"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://vimeo.com/"));
    
    $this->assertTrue(ToolManager::isUrlExists("https://www.youtube.com/user/Darkjo666"));
    }
    

    Great success to all,

    Jonathan Parent-Lévesque from Montreal

提交回复
热议问题