PHP validation/regex for URL

后端 未结 21 2084
青春惊慌失措
青春惊慌失措 2020-11-22 01:19

I\'ve been looking for a simple regex for URLs, does anybody have one handy that works well? I didn\'t find one with the zend framework validation classes and have seen sev

相关标签:
21条回答
  • 2020-11-22 01:55

    Just in case you want to know if the url really exists:

    function url_exist($url){//se passar a URL existe
        $c=curl_init();
        curl_setopt($c,CURLOPT_URL,$url);
        curl_setopt($c,CURLOPT_HEADER,1);//get the header
        curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
        curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
        curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
        if(!curl_exec($c)){
            //echo $url.' inexists';
            return false;
        }else{
            //echo $url.' exists';
            return true;
        }
        //$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
        //return ($httpcode<400);
    }
    
    0 讨论(0)
  • 2020-11-22 01:56

    OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.

    Examples:

    • google.com
    • www.microsoft.com/
    • http://www.yahoo.com/
    • https://www.bandcamp.com/artist/#!someone-special!

    All which should be marked as valid.

    function is_valid_url($url) {
        // First check: is the url just a domain name? (allow a slash at the end)
        $_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
        if (preg_match($_domain_regex, $url)) {
            return true;
        }
    
        // Second: Check if it's a url with a scheme and all
        $_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
        if (preg_match($_regex, $url, $matches)) {
            // pull out the domain name, and make sure that the domain is valid.
            $_parts = parse_url($url);
            if (!in_array($_parts['scheme'], array( 'http', 'https' )))
                return false;
    
            // Check the domain using the regex, stops domains like "-example.com" passing through
            if (!preg_match($_domain_regex, $_parts['host']))
                return false;
    
            // This domain looks pretty valid. Only way to check it now is to download it!
            return true;
        }
    
        return false;
    }
    

    Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).

    var_dump(is_valid_url('google.com'));         // true
    var_dump(is_valid_url('google.com/'));        // true
    var_dump(is_valid_url('http://google.com'));  // true
    var_dump(is_valid_url('http://google.com/')); // true
    var_dump(is_valid_url('https://google.com')); // true
    
    0 讨论(0)
  • 2020-11-22 01:58

    I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.

    Here is a very simple way to test if url actually exists and is readable :

    if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";
    

    (if there is no preg_match then this would also validate all filenames on your server)

    0 讨论(0)
  • 2020-11-22 01:59

    For anyone developing with WordPress, just use

    esc_url_raw($url) === $url
    

    to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL) because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).

    0 讨论(0)
  • 2020-11-22 02:00

    As per the PHP manual - parse_url should not be used to validate a URL.

    Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL) does not perform any better.

    Both parse_url() and filter_var() will pass malformed URLs such as http://...

    Therefore in this case - regex is the better method.

    0 讨论(0)
  • 2020-11-22 02:01

    Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).

    if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
    {
        throw new \RuntimeException( "URI has not a valid format." );
    }
    

    I have successfully unit-tested this function inside a ValueObject I made named Uri and tested by UriTest.

    UriTest.php (Contains valid and invalid cases for both URLs and URNs)

    <?php
    
    declare( strict_types = 1 );
    
    namespace XaviMontero\ThrasherPortage\Tests\Tour;
    
    use XaviMontero\ThrasherPortage\Tour\Uri;
    
    class UriTest extends \PHPUnit_Framework_TestCase
    {
        private $sut;
    
        public function testCreationIsOfProperClassWhenUriIsValid()
        {
            $sut = new Uri( 'http://example.com' );
            $this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
        }
    
        /**
         * @dataProvider urlIsValidProvider
         * @dataProvider urnIsValidProvider
         */
        public function testGetUriAsStringWhenUriIsValid( string $uri )
        {
            $sut = new Uri( $uri );
            $actual = $sut->getUriAsString();
    
            $this->assertInternalType( 'string', $actual );
            $this->assertEquals( $uri, $actual );
        }
    
        public function urlIsValidProvider()
        {
            return
                [
                    [ 'http://example-server' ],
                    [ 'http://example.com' ],
                    [ 'http://example.com/' ],
                    [ 'http://subdomain.example.com/path/?parameter1=value1&parameter2=value2' ],
                    [ 'random-protocol://example.com' ],
                    [ 'http://example.com:80' ],
                    [ 'http://example.com?no-path-separator' ],
                    [ 'http://example.com/pa%20th/' ],
                    [ 'ftp://example.org/resource.txt' ],
                    [ 'file://../../../relative/path/needs/protocol/resource.txt' ],
                    [ 'http://example.com/#one-fragment' ],
                    [ 'http://example.edu:8080#one-fragment' ],
                ];
        }
    
        public function urnIsValidProvider()
        {
            return
                [
                    [ 'urn:isbn:0-486-27557-4' ],
                    [ 'urn:example:mammal:monotreme:echidna' ],
                    [ 'urn:mpeg:mpeg7:schema:2001' ],
                    [ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                    [ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
                    [ 'urn:FOO:a123,456' ]
                ];
        }
    
        /**
         * @dataProvider urlIsNotValidProvider
         * @dataProvider urnIsNotValidProvider
         */
        public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
        {
            $this->expectException( 'RuntimeException' );
            $this->sut = new Uri( $uri );
        }
    
        public function urlIsNotValidProvider()
        {
            return
                [
                    [ 'only-text' ],
                    [ 'http//missing.colon.example.com/path/?parameter1=value1&parameter2=value2' ],
                    [ 'missing.protocol.example.com/path/' ],
                    [ 'http://example.com\\bad-separator' ],
                    [ 'http://example.com|bad-separator' ],
                    [ 'ht tp://example.com' ],
                    [ 'http://exampl e.com' ],
                    [ 'http://example.com/pa th/' ],
                    [ '../../../relative/path/needs/protocol/resource.txt' ],
                    [ 'http://example.com/#two-fragments#not-allowed' ],
                    [ 'http://example.edu:portMustBeANumber#one-fragment' ],
                ];
        }
    
        public function urnIsNotValidProvider()
        {
            return
                [
                    [ 'urn:mpeg:mpeg7:sch ema:2001' ],
                    [ 'urn|mpeg:mpeg7:schema:2001' ],
                    [ 'urn?mpeg:mpeg7:schema:2001' ],
                    [ 'urn%mpeg:mpeg7:schema:2001' ],
                    [ 'urn#mpeg:mpeg7:schema:2001' ],
                ];
        }
    }
    

    Uri.php (Value Object)

    <?php
    
    declare( strict_types = 1 );
    
    namespace XaviMontero\ThrasherPortage\Tour;
    
    class Uri
    {
        /** @var string */
        private $uri;
    
        public function __construct( string $uri )
        {
            $this->assertUriIsCorrect( $uri );
            $this->uri = $uri;
        }
    
        public function getUriAsString()
        {
            return $this->uri;
        }
    
        private function assertUriIsCorrect( string $uri )
        {
            // https://stackoverflow.com/questions/30847/regex-to-validate-uris
            // http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/
    
            if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
            {
                throw new \RuntimeException( "URI has not a valid format." );
            }
        }
    }
    

    Running UnitTests

    There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.

    xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
    PHPUnit 5.7.3 by Sebastian Bergmann and contributors.
    
    ..............................................                    46 / 46 (100%)
    
    Time: 82 ms, Memory: 4.00MB
    
    OK (46 tests, 65 assertions)
    

    Code coverage

    There's is 100% of code-coverage in this sample URI checker.

    0 讨论(0)
提交回复
热议问题