Parsing domain from a URL

前端 未结 18 2195
独厮守ぢ
独厮守ぢ 2020-11-22 12:26

I need to build a function which parses the domain from a URL.

So, with

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

or

相关标签:
18条回答
  • 2020-11-22 12:43
    function getTrimmedUrl($link)
    {
        $str = str_replace(["www.","https://","http://"],[''],$link);
        $link = explode("/",$str);
        return strtolower($link[0]);                
    }
    
    0 讨论(0)
  • 2020-11-22 12:47

    If you want extract host from string http://google.com/dhasjkdas/sadsdds/sdda/sdads.html, usage of parse_url() is acceptable solution for you.

    But if you want extract domain or its parts, you need package that using Public Suffix List. Yes, you can use string functions arround parse_url(), but it will produce incorrect results sometimes.

    I recomend TLDExtract for domain parsing, here is sample code that show diff:

    $extract = new LayerShifter\TLDExtract\Extract();
    
    # For 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'
    
    $url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
    
    parse_url($url, PHP_URL_HOST); // will return google.com
    
    $result = $extract->parse($url);
    $result->getFullHost(); // will return 'google.com'
    $result->getRegistrableDomain(); // will return 'google.com'
    $result->getSuffix(); // will return 'com'
    
    # For 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html'
    
    $url = 'http://search.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
    
    parse_url($url, PHP_URL_HOST); // will return 'search.google.com'
    
    $result = $extract->parse($url);
    $result->getFullHost(); // will return 'search.google.com'
    $result->getRegistrableDomain(); // will return 'google.com'
    
    0 讨论(0)
  • 2020-11-22 12:49

    I'm adding this answer late since this is the answer that pops up most on Google...

    You can use PHP to...

    $url = "www.google.co.uk";
    $host = parse_url($url, PHP_URL_HOST);
    // $host == "www.google.co.uk"
    

    to grab the host but not the private domain to which the host refers. (Example www.google.co.uk is the host, but google.co.uk is the private domain)

    To grab the private domain, you must need know the list of public suffixes to which one can register a private domain. This list happens to be curated by Mozilla at https://publicsuffix.org/

    The below code works when an array of public suffixes has been created already. Simply call

    $domain = get_private_domain("www.google.co.uk");
    

    with the remaining code...

    // find some way to parse the above list of public suffix
    // then add them to a PHP array
    $suffix = [... all valid public suffix ...];
    
    function get_public_suffix($host) {
      $parts = split("\.", $host);
      while (count($parts) > 0) {
        if (is_public_suffix(join(".", $parts)))
          return join(".", $parts);
    
        array_shift($parts);
      }
    
      return false;
    }
    
    function is_public_suffix($host) {
      global $suffix;
      return isset($suffix[$host]);
    }
    
    function get_private_domain($host) {
      $public = get_public_suffix($host);
      $public_parts = split("\.", $public);
      $all_parts = split("\.", $host);
    
      $private = [];
    
      for ($x = 0; $x < count($public_parts); ++$x) 
        $private[] = array_pop($all_parts);
    
      if (count($all_parts) > 0)
        $private[] = array_pop($all_parts);
    
      return join(".", array_reverse($private));
    }
    
    0 讨论(0)
  • 2020-11-22 12:51

    Check out parse_url():

    $url = 'http://google.com/dhasjkdas/sadsdds/sdda/sdads.html';
    $parse = parse_url($url);
    echo $parse['host']; // prints 'google.com'
    

    parse_url doesn't handle really badly mangled urls very well, but is fine if you generally expect decent urls.

    0 讨论(0)
  • 2020-11-22 12:53

    Please consider replacring the accepted solution with the following:

    parse_url() will always include any sub-domain(s), so this function doesn't parse domain names very well. Here are some examples:

    $url = 'http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html';
    $parse = parse_url($url);
    echo $parse['host']; // prints 'www.google.com'
    
    echo parse_url('https://subdomain.example.com/foo/bar', PHP_URL_HOST);
    // Output: subdomain.example.com
    
    echo parse_url('https://subdomain.example.co.uk/foo/bar', PHP_URL_HOST);
    // Output: subdomain.example.co.uk
    

    Instead, you may consider this pragmatic solution. It will cover many, but not all domain names -- for instance, lower-level domains such as 'sos.state.oh.us' are not covered.

    function getDomain($url) {
        $host = parse_url($url, PHP_URL_HOST);
    
        if(filter_var($host,FILTER_VALIDATE_IP)) {
            // IP address returned as domain
            return $host; //* or replace with null if you don't want an IP back
        }
    
        $domain_array = explode(".", str_replace('www.', '', $host));
        $count = count($domain_array);
        if( $count>=3 && strlen($domain_array[$count-2])==2 ) {
            // SLD (example.co.uk)
            return implode('.', array_splice($domain_array, $count-3,3));
        } else if( $count>=2 ) {
            // TLD (example.com)
            return implode('.', array_splice($domain_array, $count-2,2));
        }
    }
    
    // Your domains
        echo getDomain('http://google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
        echo getDomain('http://www.google.com/dhasjkdas/sadsdds/sdda/sdads.html'); // google.com
        echo getDomain('http://google.co.uk/dhasjkdas/sadsdds/sdda/sdads.html'); // google.co.uk
    
    // TLD
        echo getDomain('https://shop.example.com'); // example.com
        echo getDomain('https://foo.bar.example.com'); // example.com
        echo getDomain('https://www.example.com'); // example.com
        echo getDomain('https://example.com'); // example.com
    
    // SLD
        echo getDomain('https://more.news.bbc.co.uk'); // bbc.co.uk
        echo getDomain('https://www.bbc.co.uk'); // bbc.co.uk
        echo getDomain('https://bbc.co.uk'); // bbc.co.uk
    
    // IP
        echo getDomain('https://1.2.3.45');  // 1.2.3.45
    

    Finally, Jeremy Kendall's PHP Domain Parser allows you to parse the domain name from a url. League URI Hostname Parser will also do the job.

    0 讨论(0)
  • 2020-11-22 12:53

    Combining the answers of worldofjr and Alix Axel into one small function that will handle most use-cases:

    function get_url_hostname($url) {
    
        $parse = parse_url($url);
        return str_ireplace('www.', '', $parse['host']);
    
    }
    
    get_url_hostname('http://www.google.com/example/path/file.html'); // google.com
    
    0 讨论(0)
提交回复
热议问题