domain regex split

前端 未结 4 1146
情深已故
情深已故 2021-01-27 08:20

I have some domains I want to split but can\'t figure out the regex...

I have:

  • http://www.google.com/tomato
  • http://int.google.c
相关标签:
4条回答
  • 2021-01-27 08:49
    $res = preg_replace( "/^(http:\/\/)([a-z_\-]+\.)*([a-z_\-]+)\.(com|co.uk|net)\/.*$/im", "\$3", $in );
    

    Add as much endings as you know

    Edit: made a mistake :-(

    0 讨论(0)
  • 2021-01-27 08:50

    why you trying to use regex ? there's plenty of native functions available for you, such as:

    $host = parse_url($url, PHP_URL_HOST);
    

    update, give this a go, it may need improving but its better than Regex imo

    function determainDomainName($url)
    {
        $hostname = parse_url($url, PHP_URL_HOST);
        $parts = explode(".",$hostname);
    
        switch(count($parts))
        {
            case 1:
                 return $parts[0]; //has to be a .com etc
            break;
            case 2:
                if($parts[1] == "www") //The most common subdomain
                {
                    return $parts[2]; //Bypass Subdomain / return next segment
                }
    
                if($parts[2] == "co") //Possible in_array here for multiples, but first segment of double barrel tld
                {
                    return $parts[1]; //Bypass double barrel tld's
                }
            break;
            default:
                //Have a guess
                //I bet the longest word is the domain :)
                usort($parts,"mysort");
                return $parts[0];
    
                /*
                here we just order the array by the longest word
                so google will always come above the following
                com,co,uk,www,cdn,ww1,ww2 etc
                */
            break;
        }
    }
    
    function mysort($a,$b){
        return strlen($b) - strlen($a);
    }
    

    Add the following 2 functions to your libraries etc.

    Then use like so:

    $urls = array(
        'http://www.google.com/tomato',
        'http://int.google.com',
        'http://google.co.uk'
    );
    
    foreach($urls as $url)
    {
        echo determainDomainName($url) . "\n";
    }
    

    They will all echo google

    see @ http://codepad.org/pA5KWckb

    0 讨论(0)
  • 2021-01-27 09:02

    You can do this on a best bet basis. The last part of the URL is always the TLD (and optional root). And you are basically looking for any preceeding word that is longer than 2 letters:

    $url = "http://www.google.co.uk./search?q=..";
    
    preg_match("#http://
                (?:[^/]+\.)*       # cut off any preceeding www*
                ([\w-]{3,})        # main domain name
                (\.\w\w)?          # two-letter second level domain .co
                \.\w+\.?           # TLD
                (/|:|$)            # end regex with / or : or string end
                #x", 
          $url, $match);
    

    If you expect any longer second-level domains (.com maybe?) then add another \w. But this is not very generic, you would actually need a list for TLDs were this was allowed.

    0 讨论(0)
  • 2021-01-27 09:07

    The answer here might be what you're looking for.

    Getting parts of a URL (Regex)

    0 讨论(0)
提交回复
热议问题