Parsing Domainname From URL In PHP

ⅰ亾dé卋堺 提交于 2019-12-05 20:36:43

The domain is stored in $_SERVER['HTTP_HOST'].

EDIT: I believe this returns the whole domain. To just get the top-level domain, you could do this:

// Add all your wanted subdomains that act as top-level domains, here (e.g. 'co.cc' or 'co.uk')
// As array key, use the last part ('cc' and 'uk' in the above examples) and the first part as sub-array elements for that key
$allowed_subdomains = array(
    'cc'    => array(
        'co'
    ),
    'uk'    => array(
        'co'
    )
);

$domain = $_SERVER['HTTP_HOST'];
$parts = explode('.', $domain);
$top_level = array_pop($parts);

// Take care of allowed subdomains
if (isset($allowed_subdomains[$top_level]))
{
    if (in_array(end($parts), $allowed_subdomains[$top_level]))
        $top_level = array_pop($parts).'.'.$top_level;
}

$top_level = array_pop($parts).'.'.$top_level;

You can use parse_url() to split it up and get what you want. Here's an example...

    $url = 'http://www.google.com/search?hl=en&source=hp&q=google&btnG=Google+Search&meta=lr%3D&aq=&oq=dasd';
    print_r(parse_url($url));

Will echo...

Array
(
    [scheme] => http
    [host] => www.google.com
    [path] => /search
    [query] => hl=en&source=hp&q=google&btnG=Google+Search&meta=lr%3D&aq=&oq=dasd
)

I reckon you'll need a list of all suffixes used after a domain name. http://publicsuffix.org/list/ provides an up-to-date (or so they claim) of all suffixes in use currently. The list is actually here Now the idea would be for you to parse up that list into a structure, with different levels split by the dot, starting by the end levels:

so for instance for the domains: com.la com.tr com.lc

you'd end up with:

[la]=>[com]
[lc]=>[com]

etc...

Then you'd get the host from base_url (by using parse_url), and you'd explode it by dots. and you start matching up the values against your structure, starting with the last one:

so for google.com.tr you'd start by matching tr, then com, then you won't find a match once you get to google, which is what you want...

Regex and parse_url() aren't solution for you.

You need package that uses Public Suffix List, only in this way you can correctly extract domains with two-, third-level TLDs (co.uk, a.bg, b.bg, etc.). I recomend use TLD Extract.

Here example of code:

$extract = new LayerShifter\TLDExtract\Extract();

$result = $extract->parse('http://subsub.sub.google.com.tr');
$result->getRegistrableDomain(); // will return (string) 'google.com.tr'
易学教程内所有资源均来自网络或用户发布的内容,如有违反法律规定的内容欢迎反馈
该文章没有解决你所遇到的问题?点击提问,说说你的问题,让更多的人一起探讨吧!