This is a so called IDN domain.
Clients supporting IDN domains normalize it using IDNA2008 standard as specified in RFC 5890, then replace remaining unicode characters using Punycode encoding as defined in RFC 3492 before submission for DNS resolution.
By specification, literally every character in the UTF-8 character set is valid to use in a IDN domain, but every top level domain authority can define valid characters within the Unicode charset so it will be hard to create and maintain a real regex.
If you want to accept IDN domains in your application you should internally work with the encoded version. PHP extension intl brings two functions to en- and decode IDN domain names
echo idn_to_ascii('täst.de');
xn--tst-qla.de
After encoding, the domain, will pass any traditional regex check
Simple validation:
$url = "http://example.com/";
if (preg_match('/^(http|https|ftp):\/\/([A-Z0-9][A-Z0-9_-]*(?:\.[A-Z0-9][A-Z0-9_-]*)+):?(\d+)?\/?/i', $url)) {
echo 'OK';
} else {
echo 'Invalid URL.';
}
EDIT:
If you want a real DNS verfification you can use dns_get_record (PHP 5) or gethostbyaddr
e.g.
$domain = 'ελληνικά.idn.icann.org';
$idnDomain = idn_to_ascii( $domain );
if ( $dnsResult = dns_get_record( $idnDomain, DNS_ANY ) )
{
echo $idnDomain , "\n";
print_r( $dnsResult );
}
else
{
echo "failed to lookup domain\n";
}
Result:
xn--hxargifdar.idn.icann.org
Array
(
[0] => Array
(
[host] => xn--hxargifdar.idn.icann.org
[class] => IN
[ttl] => 21456
[type] => A
[ip] => 199.7.85.10
)
[1] => Array
(
[host] => xn--hxargifdar.idn.icann.org
[class] => IN
[ttl] => 21600
[type] => AAAA
[ipv6] => 2620::2830:230:0:0:0:10
)
)