Parsing domain from a URL

前端 未结 18 2194
独厮守ぢ
独厮守ぢ 2020-11-22 12:26

I need to build a function which parses the domain from a URL.

So, with

http://google.com/dhasjkdas/sadsdds/sdda/sdads.html

or

相关标签:
18条回答
  • 2020-11-22 12:28

    This will generally work very well if the input URL is not total junk. It removes the subdomain.

    $host = parse_url( $Row->url, PHP_URL_HOST );
    $parts = explode( '.', $host );
    $parts = array_reverse( $parts );
    $domain = $parts[1].'.'.$parts[0];
    

    Example

    Input: http://www2.website.com:8080/some/file/structure?some=parameters

    Output: website.com

    0 讨论(0)
  • 2020-11-22 12:29
    $domain = str_ireplace('www.', '', parse_url($url, PHP_URL_HOST));
    

    This would return the google.com for both http://google.com/... and http://www.google.com/...

    0 讨论(0)
  • 2020-11-22 12:30

    The code that was meant to work 100% didn't seem to cut it for me, I did patch the example a little but found code that wasn't helping and problems with it. so I changed it out to a couple of functions (to save asking for the list from Mozilla all the time, and removing the cache system). This has been tested against a set of 1000 URLs and seemed to work.

    function domain($url)
    {
        global $subtlds;
        $slds = "";
        $url = strtolower($url);
    
        $host = parse_url('http://'.$url,PHP_URL_HOST);
    
        preg_match("/[^\.\/]+\.[^\.\/]+$/", $host, $matches);
        foreach($subtlds as $sub){
            if (preg_match('/\.'.preg_quote($sub).'$/', $host, $xyz)){
                preg_match("/[^\.\/]+\.[^\.\/]+\.[^\.\/]+$/", $host, $matches);
            }
        }
    
        return @$matches[0];
    }
    
    function get_tlds() {
        $address = 'http://mxr.mozilla.org/mozilla-central/source/netwerk/dns/effective_tld_names.dat?raw=1';
        $content = file($address);
        foreach ($content as $num => $line) {
            $line = trim($line);
            if($line == '') continue;
            if(@substr($line[0], 0, 2) == '/') continue;
            $line = @preg_replace("/[^a-zA-Z0-9\.]/", '', $line);
            if($line == '') continue;  //$line = '.'.$line;
            if(@$line[0] == '.') $line = substr($line, 1);
            if(!strstr($line, '.')) continue;
            $subtlds[] = $line;
            //echo "{$num}: '{$line}'"; echo "<br>";
        }
    
        $subtlds = array_merge(array(
                'co.uk', 'me.uk', 'net.uk', 'org.uk', 'sch.uk', 'ac.uk', 
                'gov.uk', 'nhs.uk', 'police.uk', 'mod.uk', 'asn.au', 'com.au',
                'net.au', 'id.au', 'org.au', 'edu.au', 'gov.au', 'csiro.au'
            ), $subtlds);
    
        $subtlds = array_unique($subtlds);
    
        return $subtlds;    
    }
    

    Then use it like

    $subtlds = get_tlds();
    echo domain('www.example.com') //outputs: example.com
    echo domain('www.example.uk.com') //outputs: example.uk.com
    echo domain('www.example.fr') //outputs: example.fr
    

    I know I should have turned this into a class, but didn't have time.

    0 讨论(0)
  • 2020-11-22 12:30

    I've found that @philfreo's solution (referenced from php.net) is pretty well to get fine result but in some cases it shows php's "notice" and "Strict Standards" message. Here a fixed version of this code.

    function getHost($url) { 
       $parseUrl = parse_url(trim($url)); 
       if(isset($parseUrl['host']))
       {
           $host = $parseUrl['host'];
       }
       else
       {
            $path = explode('/', $parseUrl['path']);
            $host = $path[0];
       }
       return trim($host); 
    } 
    
    echo getHost("http://example.com/anything.html");           // example.com
    echo getHost("http://www.example.net/directory/post.php");  // www.example.net
    echo getHost("https://example.co.uk");                      // example.co.uk
    echo getHost("www.example.net");                            // example.net
    echo getHost("subdomain.example.net/anything");             // subdomain.example.net
    echo getHost("example.net");                                // example.net
    
    0 讨论(0)
  • 2020-11-22 12:31
    function get_domain($url = SITE_URL)
    {
        preg_match("/[a-z0-9\-]{1,63}\.[a-z\.]{2,6}$/", parse_url($url, PHP_URL_HOST), $_domain_tld);
        return $_domain_tld[0];
    }
    
    get_domain('http://www.cdl.gr'); //cdl.gr
    get_domain('http://cdl.gr'); //cdl.gr
    get_domain('http://www2.cdl.gr'); //cdl.gr
    
    0 讨论(0)
  • 2020-11-22 12:33

    parse_url didn't work for me. It only returned the path. Switching to basics using php5.3+:

    $url  = str_replace('http://', '', strtolower( $s->website));
    if (strpos($url, '/'))  $url = strstr($url, '/', true);
    
    0 讨论(0)
提交回复
热议问题