So let\'s say I have just-a.domain.com,just-a-domain.info,just.a-domain.net
how can I remove the extension .com,.net.info ...
and I need the result
Regex and parse_url()
aren't solution for you.
You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.
Here example of code:
$extract = new LayerShifter\TLDExtract\Extract();
$result = $extract->parse('just.a-domain.net');
$result->getSubdomain(); // will return (string) 'just'
$result->getHostname(); // will return (string) 'a-domain'
$result->getSuffix(); // will return (string) 'net'
$result->getRegistrableDomain(); // will return (string) 'a-domain.net'
strrpos($str, ".")
Will give you the index for the last period in your string, then you can use substr()
with the index and return the short string.
If you want to remove the part of the domain that is administrated by domain name registrars, you will need to use a list of such suffixes like the Public Suffix List.
But since a walk through this list and testing the suffix on the domain name is not that efficient, rather use this list only to build an index like this:
$tlds = array(
// ac : http://en.wikipedia.org/wiki/.ac
'ac',
'com.ac',
'edu.ac',
'gov.ac',
'net.ac',
'mil.ac',
'org.ac',
// ad : http://en.wikipedia.org/wiki/.ad
'ad',
'nom.ad',
// …
);
$tldIndex = array_flip($tlds);
Searching for the best match would then go like this:
$levels = explode('.', $domain);
for ($length=1, $n=count($levels); $length<=$n; ++$length) {
$suffix = implode('.', array_slice($levels, -$length));
if (!isset($tldIndex[$suffix])) {
$length--;
break;
}
}
$suffix = implode('.', array_slice($levels, -$length));
$prefix = substr($domain, 0, -strlen($suffix) - 1);
Or build a tree that represents the hierarchy of the domain name levels as follows:
$tldTree = array(
// ac : http://en.wikipedia.org/wiki/.ac
'ac' => array(
'com' => true,
'edu' => true,
'gov' => true,
'net' => true,
'mil' => true,
'org' => true,
),
// ad : http://en.wikipedia.org/wiki/.ad
'ad' => array(
'nom' => true,
),
// …
);
Then you can use the following to find the match:
$levels = explode('.', $domain);
$r = &$tldTree;
$length = 0;
foreach (array_reverse($levels) as $level) {
if (isset($r[$level])) {
$r = &$r[$level];
$length++;
} else {
break;
}
}
$suffix = implode('.', array_slice($levels, - $length));
$prefix = substr($domain, 0, -strlen($suffix) - 1);
preg_match('/(.*?)((?:\.co)?.[a-z]{2,4})$/i', $domain, $matches);
$matches[1] will have the domain and $matches[2] will have the extension
<?php
$domains = array("google.com", "google.in", "google.co.in", "google.info", "analytics.google.com");
foreach($domains as $domain){
preg_match('/(.*?)((?:\.co)?.[a-z]{2,4})$/i', $domain, $matches);
print_r($matches);
}
?>
Will produce the output
Array
(
[0] => google.com
[1] => google
[2] => .com
)
Array
(
[0] => google.in
[1] => google
[2] => .in
)
Array
(
[0] => google.co.in
[1] => google
[2] => .co.in
)
Array
(
[0] => google.info
[1] => google
[2] => .info
)
Array
(
[0] => analytics.google.com
[1] => analytics.google
[2] => .com
)
$subject = 'just-a.domain.com';
$result = preg_split('/(?=\.[^.]+$)/', $subject);
This produces the following array
$result[0] == 'just-a.domain';
$result[1] == '.com';