domain regex split

前端未结

关注

 4  1149

I have some domains I want to split but can\'t figure out the regex...

I have:

http://www.google.com/tomato
http://int.google.c


                      
              相关标签:


      
      
        
          4条回答        

        
                         				            
            
           
            
                              
                
              
              
                
                  甜味超标        
                
              
                            
                2021-01-27 08:49
              
            
            
                                                                       
$res = preg_replace( "/^(http:\/\/)([a-z_\-]+\.)*([a-z_\-]+)\.(com|co.uk|net)\/.*$/im", "\$3", $in );


Add as much endings as you know

Edit: made a mistake :-(
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  时光取名叫无心        
                
              
                            
                2021-01-27 08:50
              
            
            
                                                                       
why you trying to use regex ? there's plenty of native functions available for you, such as:

$host = parse_url($url, PHP_URL_HOST);




update, give this a go, it may need improving but its better than Regex imo

function determainDomainName($url)
{
    $hostname = parse_url($url, PHP_URL_HOST);
    $parts = explode(".",$hostname);

    switch(count($parts))
    {
        case 1:
             return $parts[0]; //has to be a .com etc
        break;
        case 2:
            if($parts[1] == "www") //The most common subdomain
            {
                return $parts[2]; //Bypass Subdomain / return next segment
            }

            if($parts[2] == "co") //Possible in_array here for multiples, but first segment of double barrel tld
            {
                return $parts[1]; //Bypass double barrel tld's
            }
        break;
        default:
            //Have a guess
            //I bet the longest word is the domain :)
            usort($parts,"mysort");
            return $parts[0];

            /*
            here we just order the array by the longest word
            so google will always come above the following
            com,co,uk,www,cdn,ww1,ww2 etc
            */
        break;
    }
}

function mysort($a,$b){
    return strlen($b) - strlen($a);
}


Add the following 2 functions to your libraries etc.

Then use like so:

$urls = array(
    'http://www.google.com/tomato',
    'http://int.google.com',
    'http://google.co.uk'
);

foreach($urls as $url)
{
    echo determainDomainName($url) . "\n";
}


They will all echo google

see @ http://codepad.org/pA5KWckb
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  情话喂你        
                
              
                            
                2021-01-27 09:02
              
            
            
                                                                       
You can do this on a best bet basis. The last part of the URL is always the TLD (and optional root). And you are basically looking for any preceeding word that is longer than 2 letters:

$url = "http://www.google.co.uk./search?q=..";

preg_match("#http://
            (?:[^/]+\.)*       # cut off any preceeding www*
            ([\w-]{3,})        # main domain name
            (\.\w\w)?          # two-letter second level domain .co
            \.\w+\.?           # TLD
            (/|:|$)            # end regex with / or : or string end
            #x", 
      $url, $match);


If you expect any longer second-level domains (.com maybe?) then add another \w. But this is not very generic, you would actually need a list for TLDs were this was allowed.
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
            
           
            
                              
                
              
              
                
                  一向        
                
              
                            
                2021-01-27 09:07
              
            
            
                                                                       
The answer here might be what you're looking for.

Getting parts of a URL (Regex)
                                                                        
                                                        
            
            
              
                
                0
              
                 
                
               讨论(0)
              
              
                                                   
              
                                                            
            
                      
                    


               
            
    发布评论:
    
         
                        
    
    提交评论 
  
  

                    
                    
                    
                        
                        
                         加载中...
                        
                    
                
          
          	          
                             
        
        
          
            
            
              
              
            
    


                                 
              
            
                          
    

        
         
                验证码
                
                  
                
                
                   看不清?
                
              
                                  
                    
   
                 
             
              提交回复