I\'ve been looking for a simple regex for URLs, does anybody have one handy that works well? I didn\'t find one with the zend framework validation classes and have seen sev
Just in case you want to know if the url really exists:
function url_exist($url){//se passar a URL existe
$c=curl_init();
curl_setopt($c,CURLOPT_URL,$url);
curl_setopt($c,CURLOPT_HEADER,1);//get the header
curl_setopt($c,CURLOPT_NOBODY,1);//and *only* get the header
curl_setopt($c,CURLOPT_RETURNTRANSFER,1);//get the response as a string from curl_exec(), rather than echoing it
curl_setopt($c,CURLOPT_FRESH_CONNECT,1);//don't use a cached version of the url
if(!curl_exec($c)){
//echo $url.' inexists';
return false;
}else{
//echo $url.' exists';
return true;
}
//$httpcode=curl_getinfo($c,CURLINFO_HTTP_CODE);
//return ($httpcode<400);
}
OK, so this is a little bit more complex then a simple regex, but it allows for different types of urls.
Examples:
All which should be marked as valid.
function is_valid_url($url) {
// First check: is the url just a domain name? (allow a slash at the end)
$_domain_regex = "|^[A-Za-z0-9-]+(\.[A-Za-z0-9-]+)*(\.[A-Za-z]{2,})/?$|";
if (preg_match($_domain_regex, $url)) {
return true;
}
// Second: Check if it's a url with a scheme and all
$_regex = '#^([a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))$#';
if (preg_match($_regex, $url, $matches)) {
// pull out the domain name, and make sure that the domain is valid.
$_parts = parse_url($url);
if (!in_array($_parts['scheme'], array( 'http', 'https' )))
return false;
// Check the domain using the regex, stops domains like "-example.com" passing through
if (!preg_match($_domain_regex, $_parts['host']))
return false;
// This domain looks pretty valid. Only way to check it now is to download it!
return true;
}
return false;
}
Note that there is a in_array check for the protocols that you want to allow (currently only http and https are in that list).
var_dump(is_valid_url('google.com')); // true
var_dump(is_valid_url('google.com/')); // true
var_dump(is_valid_url('http://google.com')); // true
var_dump(is_valid_url('http://google.com/')); // true
var_dump(is_valid_url('https://google.com')); // true
I don't think that using regular expressions is a smart thing to do in this case. It is impossible to match all of the possibilities and even if you did, there is still a chance that url simply doesn't exist.
Here is a very simple way to test if url actually exists and is readable :
if (preg_match("#^https?://.+#", $link) and @fopen($link,"r")) echo "OK";
(if there is no preg_match
then this would also validate all filenames on your server)
For anyone developing with WordPress, just use
esc_url_raw($url) === $url
to validate a URL (here's WordPress' documentation on esc_url_raw). It handles URLs much better than filter_var($url, FILTER_VALIDATE_URL)
because it is unicode and XSS-safe. (Here is a good article mentioning all the problems with filter_var).
As per the PHP manual - parse_url should not be used to validate a URL.
Unfortunately, it seems that filter_var('example.com', FILTER_VALIDATE_URL)
does not perform any better.
Both parse_url()
and filter_var()
will pass malformed URLs such as http://...
Therefore in this case - regex is the better method.
Inspired in this .NET StackOverflow question and in this referenced article from that question there is this URI validator (URI means it validates both URL and URN).
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
I have successfully unit-tested this function inside a ValueObject I made named Uri
and tested by UriTest
.
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tests\Tour;
use XaviMontero\ThrasherPortage\Tour\Uri;
class UriTest extends \PHPUnit_Framework_TestCase
{
private $sut;
public function testCreationIsOfProperClassWhenUriIsValid()
{
$sut = new Uri( 'http://example.com' );
$this->assertInstanceOf( 'XaviMontero\\ThrasherPortage\\Tour\\Uri', $sut );
}
/**
* @dataProvider urlIsValidProvider
* @dataProvider urnIsValidProvider
*/
public function testGetUriAsStringWhenUriIsValid( string $uri )
{
$sut = new Uri( $uri );
$actual = $sut->getUriAsString();
$this->assertInternalType( 'string', $actual );
$this->assertEquals( $uri, $actual );
}
public function urlIsValidProvider()
{
return
[
[ 'http://example-server' ],
[ 'http://example.com' ],
[ 'http://example.com/' ],
[ 'http://subdomain.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'random-protocol://example.com' ],
[ 'http://example.com:80' ],
[ 'http://example.com?no-path-separator' ],
[ 'http://example.com/pa%20th/' ],
[ 'ftp://example.org/resource.txt' ],
[ 'file://../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#one-fragment' ],
[ 'http://example.edu:8080#one-fragment' ],
];
}
public function urnIsValidProvider()
{
return
[
[ 'urn:isbn:0-486-27557-4' ],
[ 'urn:example:mammal:monotreme:echidna' ],
[ 'urn:mpeg:mpeg7:schema:2001' ],
[ 'urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'rare-urn:uuid:6e8bc430-9c3a-11d9-9669-0800200c9a66' ],
[ 'urn:FOO:a123,456' ]
];
}
/**
* @dataProvider urlIsNotValidProvider
* @dataProvider urnIsNotValidProvider
*/
public function testCreationThrowsExceptionWhenUriIsNotValid( string $uri )
{
$this->expectException( 'RuntimeException' );
$this->sut = new Uri( $uri );
}
public function urlIsNotValidProvider()
{
return
[
[ 'only-text' ],
[ 'http//missing.colon.example.com/path/?parameter1=value1¶meter2=value2' ],
[ 'missing.protocol.example.com/path/' ],
[ 'http://example.com\\bad-separator' ],
[ 'http://example.com|bad-separator' ],
[ 'ht tp://example.com' ],
[ 'http://exampl e.com' ],
[ 'http://example.com/pa th/' ],
[ '../../../relative/path/needs/protocol/resource.txt' ],
[ 'http://example.com/#two-fragments#not-allowed' ],
[ 'http://example.edu:portMustBeANumber#one-fragment' ],
];
}
public function urnIsNotValidProvider()
{
return
[
[ 'urn:mpeg:mpeg7:sch ema:2001' ],
[ 'urn|mpeg:mpeg7:schema:2001' ],
[ 'urn?mpeg:mpeg7:schema:2001' ],
[ 'urn%mpeg:mpeg7:schema:2001' ],
[ 'urn#mpeg:mpeg7:schema:2001' ],
];
}
}
<?php
declare( strict_types = 1 );
namespace XaviMontero\ThrasherPortage\Tour;
class Uri
{
/** @var string */
private $uri;
public function __construct( string $uri )
{
$this->assertUriIsCorrect( $uri );
$this->uri = $uri;
}
public function getUriAsString()
{
return $this->uri;
}
private function assertUriIsCorrect( string $uri )
{
// https://stackoverflow.com/questions/30847/regex-to-validate-uris
// http://snipplr.com/view/6889/regular-expressions-for-uri-validationparsing/
if( ! preg_match( "/^([a-z][a-z0-9+.-]*):(?:\\/\\/((?:(?=((?:[a-z0-9-._~!$&'()*+,;=:]|%[0-9A-F]{2})*))(\\3)@)?(?=(\\[[0-9A-F:.]{2,}\\]|(?:[a-z0-9-._~!$&'()*+,;=]|%[0-9A-F]{2})*))\\5(?::(?=(\\d*))\\6)?)(\\/(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\8)?|(\\/?(?!\\/)(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/]|%[0-9A-F]{2})*))\\10)?)(?:\\?(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\11)?(?:#(?=((?:[a-z0-9-._~!$&'()*+,;=:@\\/?]|%[0-9A-F]{2})*))\\12)?$/i", $uri ) )
{
throw new \RuntimeException( "URI has not a valid format." );
}
}
}
There are 65 assertions in 46 tests. Caution: there are 2 data-providers for valid and 2 more for invalid expressions. One is for URLs and the other for URNs. If you are using a version of PhpUnit of v5.6* or earlier then you need to join the two data providers into a single one.
xavi@bromo:~/custom_www/hello-trip/mutant-migrant$ vendor/bin/phpunit
PHPUnit 5.7.3 by Sebastian Bergmann and contributors.
.............................................. 46 / 46 (100%)
Time: 82 ms, Memory: 4.00MB
OK (46 tests, 65 assertions)
There's is 100% of code-coverage in this sample URI checker.