I\'m working on an email validation regex in PHP and I need to know how long the TLD could possibly be and still be valid. I did a few searches but couldn\'t find much infor
The longest TLD currently in existence is 24 characters long, and subject to change. The maximum TLD length specified by RFC 1034 is 63 octets.
To get the length of the longest existing TLD:
wget -qO - http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
Here's what that command does:
Alternative using curl
thanks to Stefan:
curl -s http://data.iana.org/TLD/tlds-alpha-by-domain.txt | tail -n+2 | wc -L
This is PHP
code to get up-to-date vertical bar separated UTF-8
TLDs list to be used directly in a regular expression:
<?php
function getTLDs($separator){
$tlds=file('http://data.iana.org/TLD/tlds-alpha-by-domain.txt');
array_shift($tlds); // remove heading comment
usort($tlds,function($a,$b){ return strlen($b)-strlen($a); }); // sort from longest to shortest
return implode($separator,array_map(function($e){ return idn_to_utf8(trim(strtolower($e))); },$tlds));
}
echo getTLDs('|');
?>
To match a host name you could use it like this:
$tlds=getTLDs('|');
if (preg_match("{([\da-z\.-]+)\.($tlds)}u",$address)) {
..
}
A TLD can be any length at all. New TLDs happen all the time. In the future there will be more TLDs not regulated by the entity currently regulating the majority of TLDs. We also won't use email in the future as we presently do. That said:
You don't need to validate an email address ever. If you want to slow people down and have an idea as to whether they're actually human, include a CAPTCHA. If you need to confirm working email, send an email with a validation link they can open. If you aren't throttling submissions of things that can generate things like emails being sent for verification, it won't matter whether you're confirming the address is technically valid anyway, it will be abused at that point regardless.
DNS allows for a maximum of 63 characters for an individual label.
The longest with latin letters is .MUSEUM (source), but there are some with special characters. The longest from them is XN--CLCHC0EA0B2G2A9GCD. Also, in a short time, it will be possible to reserve your own TLD for a high price and so it will be possible to be longer.
Since I'm a .net developer following is the java-script representation of determining the longest TLD currently available.this will return the length of the longest TLD which you would be able to use in your RegEx.
please try the following Code Snippet
function getTLD() {
var length = 0;
var longest;
var request = new XMLHttpRequest();
request.open('GET', 'http://data.iana.org/TLD/tlds-alpha-by-domain.txt', true);
request.send(null);
request.onreadystatechange = function () {
if (request.readyState === 4 && request.status === 200) {
var type = request.getResponseHeader('Content-Type');
if (type.indexOf("text") !== 1) {
var tldArr = request.responseText.split('\n');
tldArr.splice(0, 1);
for (var i = 0; i < tldArr.length; i++) {
if (tldArr[i].length > length) {
length = tldArr[i].length;
longest = tldArr[i];
}
}
console.log("Longest >> " + longest + " >> " + length);
return length;
}
}
}
}
<button onclick="getTLD()">Get TLD</button>